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PREFACE 

Fifty  years  ago  a  large  section  of  the  general  public  were  not 
only  uninterested  in  what  we  now  call  the  social  problem,  but  they 
scarcely  gave  a  thought  to  the  existence  of  such  a  problem.  They 
felt  vaguely  perhaps,  during  periods  of  acute  distress  due  to  lack  of 
employment,  that  all  was  not  well  and  they  thought  the  Govern- 
ment or  possibly  the  big  landowner  was  to  blame,  but  only  the 
more  enlightened  realized  the  complexity  of  the  body  poHtic  and 
how  fearfully  and  wonderfully  it  is  made.  To-day  all  this  is  changed, 
and  comparatively  few  imagine  that  a  single  panacea — the  pro- 
hibition of  drink,  the  nationalization  of  land,  or  a  levy  on  capital — 
will  cure  all  evils. 

The  very  fact  that  nearly  the  whole  civilized  world  has  given 
itself  up  for  over  four  years  to  the  destruction  of  life  and  the  dragging 
down  of  the  social  fabric  in  all  countries  on  so  vast  a  scale  has 
led  to  a  surfeit  and  a  reaction  in  which  thoughtful  men  are  eager 
to  take  part  in  proclaiming  again  a  common  brotherhood  and  in 
building  a  better  world.  Those  who  have  always  been  interested 
in  this  kind  of  architecture  welcome  the  change  of  spirit,  but  they 
also  recognize  the  difficulty  of  the  task  undertaken  and  the  need 
for  no  little  mental  effort  to  second  the  good- will,  which  is  the  jQrst 
essential  for  success.  To  pull  down  no  teacher  is  needed,  but  we 
must  learn  to  build. 

This  leads  one  to  the  subject  of  the  present  book.  The  man  who 
wishes  his  work  to  stand  must  make  sure  of  its  foundations.  He 
cannot  afford  to  rest  satisfied,  as  too  often  the  poUtician  and  social 
worker  do,  with  wild  and  ill-informed  generalizations  where  more 
exact  knowledge  is  possible,  and  there  are  few  human  problems  in 
the  discussion  of  which  some  acquaintance  with  the  proper  treat- 
ment of  statistics  is  not  in  the  highest  degree  necessary. 
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Most  people,  however,  are  suspicious  of  figures.  They  imagine 
that  quantitative  considerations  must  of  necessity  deaden  all 
feeling  for  the  purely  aesthetic  or  qualitative  spirit  which  is  the 
very  life  of  the  phenomena  observed  or  measured.  But  this  surely 
need  not  be  the  case.  Kepler,  when  he  succeeded  in  translating 
the  motions  of  the  planets  into  the  language  of  number  was  not,  we 
believe,  the  less  but  rather  the  more  enamoured  of  the  beauty  and 
order  with  which  the  whole  of  creation  is  clothed. 

A  second  reason  for  suspicion  is  that  partisans  of  one  school  or 
another  with  more  push  than  principle  sometimes  trade  upon  the 
general  ignorance  of  statistics  to  '  prove  '  their  own  pet  theories, 
while  others  no  less  enthusiastic  lead  the  credulous  public  into  the 
ditch,  not  with  malice  intent,  but  because  they  are  really  blind 
themselves  to  the  right  interpretation  of  the  figures  they  so  glibly 
quote. 

Although  a  concern  in  social  questions  led  the  present  writer  in 
the  first  instance  to  study  the  theory  of  statistics,  there  is  no  reason 
why  this  bias  should  prevent  the  book  being  of  service  to  those  who 
wish  to  know  something  of  its  application  in  other  directions,  seeing 
that  the  general  principles  underlying  the  theory  are  the  same  in 
all  cases,  and  illustrations  have  been  taken  from  any  field,  biological, 
economic,  medical,  etc.,  just  as  they  suited  the  immediate  purpose 
in  view. 

The  author  makes  no  claim  to  any  originality  :  he  is  no  more 
than  a  student  seeking  to  put  together,  with  some  kind  of  system 
and  as  he  understands  them,  the  simpler  and  more  important  ideas 
he  has  gathered  from  other  sources.  The  matter  is  entirely  the 
work  of  others,  the  manner  only  is  his  own,  and  he  will  be  happy 
to  receive  criticism  if  thereby  he  may  learn  more.  His  chief  quali- 
fication for  writing  is  that  he  has  had  to  worry  through  most  of 
his  difficulties  alone,  and  consequently  he  knows  where  another 
student  is  likely  to  be  in  trouble  better  perhaps  than  the  kind  of 
writer  who  is  so  quick  as  to  be  able  to  see  through  things  at  a  glance 
or,  failing  that,  so  fortunate  as  to  be  able  to  borrow  immediate 
light  from  others. 

The  book  is  divided  into  two  parts.  Practically  all  the  first  part 
should  be  well  within  the  understanding  of  the  ordinary  person. 


PREFACE  vii 

Part  II.  is  more  mathematical,  but  an  effort  has  been  made  through- 
out to  explain  results  in  such  a  way  that  the  reader  shall  gain  a 
general  idea  of  the  theory  and  be  able  to  apply  it  without  needing 
to  master  all  the  actual  proofs.  The  whole  is  meant,  not  as  an 
exhaustive  treatise,  but  merely  as  a  first  course  introducing  the 
reader  to  more  serious  works,  and,  since  real  inspiration  is  to  be 
found  nowhere  so  surely  as  at  the  source,  it  is  intended  to  encourage 
and  fit  him  to  pursue  the  subject  further  by  consulting  at  least  the 
most  important  original  papers  referred  to  in  the  text,  only  enough 
references  being  given  to  awaken  curiosity.  With  the  same  inten- 
tion a  short  chapter  is  inserted  after  the  Appendix  by  way  of  sug- 
gesting a  few  of  the  sources  of  statistics  Hkely  to  be  of  interest  to 
the  social  student. 

Some  living  writers,  notably  Professor  Karl  Pearson,  have 
contributed  so  largely  to  the  development  and  application  of 
statistics  that  it  is  impossible  to  write  upon  the  subject  at  all  without 
incorporating  large  parts  of  their  work,  and  the  least  one  can  do 
is  gladly  to  record  the  benefit  and  pleasure  one  has  received  from 
them.  The  author's  indebtedness  to  the  two  most  important 
English  text-books — Yule's  Theory  of  Statistics  and  Bowley's 
Elements  of  Statistics — will  be  evident  also  to  any  one  who  knows 
these  books,  for  they  became  so  familiar  through  constant  study 
that  he  fears  he  may  have  drawn  upon  them  unconsciously  even 
to  the  point  of  plagiarism  in  places. 

Finally,  he  wishes  specially  to  acknowledge  the  kindness  of  four 
friends — Mr.  Peter  Fraser,  Lecturer  in  Mathematics  at  Bristol  Uni- 
versity, without  whose  encouragement  in  the  early  stages  the  work 
would  never  have  been  attempted  ;  Pjrofessor  H.  T.  H.  Piaggio, 
University  College,  Nottingham,  and  Mr.  A.  W.  Young,  sometime 
Lecturer  at  the  Sir  John  Cass  Technical  Institute,  London,  whose 
criticisms  and  suggestions  were  most  valuable ;  and  Professor 
W.  P.  Milne,  of  Leeds  University,  who,  both  as  a  practical  teacher 
and  as  Editor  of  this  series,  ungrudgingly  gave  his  help  and  advice. 

D.  C.  J. 
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PART    I 
CHAPTER  I 

mTRODUCTORY 

Early  Historical  Beginnings.  Statistics,  more  or  less  valuable, 
have  been  compiled  in  most  civilized  countries  from  very  early 
times.  One  reason  for  doing  this  on  a  large  scale  has  been  to 
ascertain  the  man-power  and  material  strength  of  the  nation  for 
miUtary  or  fiscal  purposes,  and  we  read  in  the  Old  Testament  of 
such  censuses  being  taken  in  the  case  of  the  Jews,  while  among  the 
Romans  also  it  was  a  common  practice. 

In  England,  as  economic  terms  began  to  be  used  and  their  mean-' 
ings  analysed,  and  especially  during  the  period  when  the  mercantile 
system  prevailed,  and  the  Government  endeavoured  so  far  as  was 
practicable  to  direct  industry  into  channels  such  that  it  would  add 
most  to  the  power  of  the  realm,  men  tried  frequently  to  base  argu- 
ments for  social  and  political  reform  upon  the  results  of  figures 
collected.  A  distinct  advance  had  been  made  in  the  seventeenth 
century  when  mortality  tables  were  drawn  up  and  discussed  by  Sir 
William  Petty  and  Halley,  the  famous  astronomer,  among  others, 
and  their  labours  prepared  the  way  for  a  more  scientific  treatment 
of  statistical  methods,  especially  at  the  hands  of  one,  Siissmilch,  a 
Prussian  clergyman,  who  published  an  important  work  in  1761. 

It  is  almost  true  to  say,  however,  that  until  the  time  of  the  great 
Belgian,  Quetelet  (1796-1874),  no  substantial  theory  of  statistics 
existed.  The  justice  of  this  claim  will  be  recognized  when  we 
remark  that  it  was  he  who  really  grasped  the  significance  of  one 
of  the  fundamental  principles — sometimes  spoken  of  as  the  constancy 
of  great  numbers — upon  which  the  theory  is  based.  A  simple  illus- 
tration will  explain  the  nature  of  this  important  idea  :  Imagine 
100,000  EngUshmen,  all  of  the  same  age  and  living  under  the  same 
normal  conditions — ruling  out,  that  is,  such  abnormalities  as  are 
occasioned  by  wars,  famines,  pestilence,  etc.  Let  us  divide  these 
men  at  random  into  ten  groups,  containing  10,000  each,  and  note 
the  age  of  every  man  when  he  dies.     Quetelet's  principle  lays 
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down  that,  although  we  cannot  foretell  how  long  any  particular 
individual  will  live,  the  ages  at  death  of  the  10,000  added  together, 
whichever  group  we  consider,  will  be  practically  the  same.  De- 
pending upon  this  fact  insurance  companies  calculate  the  premiums 
they  must  charge,  by  a  process  of  averaging  mortality  results  re- 
corded in  the  past,  and  so  they  are  able  to  carry  on  business  without 
serious  ,risk  of  bankruptcy. 

As  a  distinguished  statistician  once  said,  '  By  the  use  of  statistics 
we  obtain  from  milliards  of  facts  the  grand  average  of  the  world.' 
But  if  the  average  resulting  from  our  observations  were  subject  to 
violent  fluctuation  as  we  passed  from  one  set  of  facts  to  another 
cognate  set  there  would  be  little  satisfaction  in  finding  it.  It  is 
the  comparative  constancy  of  the  average,  if  the  number  of  our 
observations  is  large  enough,  which  makes  it  so  important,  as 
Quetelet  observed,  for  although  the  idea  was  not  altogether  new  he 
first  realized  how  wide  an  application  it  had  and  how  fruitful  of 
practical  results  it  might  prove. 

Quetelet  was  born  in  Ghent,  and  taught  mathematics  in  the 
College  there  in  his  early  youth.  After  graduating  as  Doctor  of 
Science  he  became  Professor  of  Mathematics  in  Brussels  Athenaeum 
when  only  twenty-three  years  old,  and  later  he  was  made  Director 
of  the  Brussels  Observatory,  in  the  foundation  of  which  he  had 
taken  a  leading  part.  In  1841  he  was  appointed  President  of  the 
Central  Commission  of  Statistics,  where  he  was  in  a  position  to 
render  valuable  assistance  to  the  Belgian  Government  by  his  advice 
on  important  social  questions.  He  initiated  the  International 
Statistical  Congress,  which  has  served  to  bring  together  the  leading 
statisticians  of  all  countries,  and  the  first  meeting  was  held  in  1853 
at  Brussels.     His  death  occurred  at  the  ripe  age  of  seventy- eight. 

Some  idea  of  the  extent  of  Quetelet's  statistical  researches  may 
be  gathered  from  the  titles  of  his  chief  works  :  (1)  Sur  Vhomme  et 
le  developpement  de  ses  facuUes,  ou  essai  de  physique  sociale  (1835) ; 
(2)  Lettres  .  .  .  sur  la  theorie  des  probabilites  appliquee  aux  sciences 
morales  et  politiques  (1846)  ;  (3)  Du  systeme  social  et  des  lois  qui  le 
regissent  (1848)  ;  (4)  U AnthropomHrie,  ou  mesure  des  differentes 
facultes  de  Vhomme  (1871). 

In  his  writings  he  visuaUzes  a  man  with  qualities  of  average 
measurement,  physical  and  mental  (V%omme  moyen),  and  shows 
how  all  other  men,  in  respect  of  any  particular  organ  or  character, 
can  be  ranged  about  the  mean  or  average  man,  just  as  in  Physics 
a  number  of  observations  of  the  same  thing  are  ranged  about 
the  mean  of  all  the  observations.     Hence  he  concluded  that  the 
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methods  of  Probability,  which  are  so  effective  in  discussing  errors 
of  observation,  could  be  used  also  in  Statistics,  and  that  deviations 
from  the  mean  in  both  cases  would  be  subject  to  the  binomial  law. 

Hain  in  Vienna  put  some  of  Quetelet's  ideas  to  good  service  in 
1852,  employing  a  superior  method  for  the  calculation  of  statistical 
variability.  Knapp  and  Lexis  in  Germany,  also  following  up 
Quetelet's  principles,  made  an  exhaustive  investigation  several  years 
later  of  the  statistics  of  mortality,  and  their  work  has  been  extended 
in  many  directions,  and  in  our  own  time  notably  by  Galton,  Karl 
Pearson,  and  Edgeworth. 

The  name  of  Sir  Francis  Galton  (1822-1911),  to  whose  work  as 
a  pioneer  the  science  of  Statistics  owes  so  much,  is  deserving  of 
even  greater  honour  than  it  has  yet  received.  Founder  of  the  School 
of  Eugenics,  Galton  himself  came  of  famous  stock,  being  grandson 
of  Erasmus  Darwin  and  a  cousin  to  Charles  Darwin.  He  studied 
medicine  in  early  youth,  but  after  graduating  at  Cambridge  his 
attention  was  turned  to  exploration,  and  the  Royal  Geographical 
Society  awarded  him  a  gold  medal  on  the  results  of  his  investiga- 
tions in  South- West  Africa.  His  first  great  work  on  heredity  was 
not  published  till  1869,  after  he  had  already  earned  distinction  in 
other  directions,  for  he  was  elected  a  Fellow  of  the  Royal  Society 
in  1860.  Alive  with  new  ideas,  marvellously  patient  and  persistent 
in  bringing  them  to  the  test  of  observation — qualities  essential  for 
real  scientific  research — he  set  himself  to  inquire  into  the  laws 
governing  the  transmission  of  characteristics,  physical  and  mental, 
from  one  generation  to  another.  Large  tracts  of  this  ground  have 
since  been  carefully  explored  and  mapped  out  by  the  school  of 
his  great  successor,  Karl  Pearson,  who  has  originated  formulae  for 
testing  the  extensive  anthropometrical  and  biological  data  col- 
lected. Largely  as  a  result  of  their  work  it  is  now  widely  recognized 
that  '  the  whole  problem  of  evolution,'  as  Professor  Pearson  himself 
has  well  said, '  is  a  problem  in  vital  statistics — a  problem  of  longevity, 
of  fertility,  of  health,  and  of  disease,  and  it  is  as  impossible  for  the 
evolutionist  to  proceed  without  statistics  as  it  would  be  for  the 
Registrar-General  to  discuss  the  national  mortality  without  an 
enumeration  of  the  population,  a  classification  of  deaths,  and  a 
knowledge  of  statistical  theory.' 

Logical  Development.  The  best  way  to  approach  the  study  of 
any  subject,  if  one  had  time,  would  be  along  the  lines  of  its  historical 
development,  but  these  lines  seem  so  often  to  diverge  from  the 
main  theme,  like  branches  from  the  parent  stem  of  a  tree,  that 
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when  one  tries  to  describe  them  the  general  effect  is  apt  to  be  some- 
what confusing.  It  is  therefore  usually  the  custom  to  adopt  a 
logical  rather  than  a  historical  sequence,  but  it  may  assist  the  reader 
to  see  the  connection  between  the  two  and  the  unity  which  embraces 
the  whole  if  we  now  briefly  trace  the  natural  growth  of  the  subject, 
suggesting  the  steps  we  might  expect  it  logically  to  take.  This  we 
have  tried  to  keep  in  view  as  nearly  as  possible  in  the  succeeding 
chapters,  except  that  the  order  may  have  been  altered  here  and 
matter  may  have  been  omitted  or  inserted  there  as* reason  and 
the  elementary  nature  of  the  work  dictated  : — 

1.  Owing  to  the  difficulty  which  the  mind  experiences  in  grasping 
a  large  mass  of  figures,  the  necessity  for  an  average  arises  to  sum 
up  shortly  the  character  of  the  mass,  and  various  kinds  of  averages 
are  proposed. 

2.  An  average  proves  insufficient  alone  to  define  the  whole  scheme 
of  observations,  and  other  constants  are  invented  to  measure  their 
spread  or  dispersion  about  the  average. 

3.  Considerations  of  space  and  the  desire  for  some  kind  of  system 
lead  further  to  the  formation  of  tables  with  the  observations  classi- 
fied in  ordered  groups. 

4.  The  formation  of  these  tables  suggests  the  possibility  of  a 
graphical  representation  of  the  numbers  in  the  different  groups  to 
bring  out  the  nature  of  their  distribution. 

5.  The  impossibihty  of  dealing  with  a  whole  population  results 
in  the  selection  of  samples,  and  the  comparison  of  one  sample  with 
another  introduces  the  subject  of  random  errors. 

6.  The  closer  examination  of  this  subject  leads  us  into  the  domain 
of  mathematical  probability  and  discovers  the  probabiHty  curve,  or 
normal  curve  of  error,  first  formulated  in  connection  with  the  study 
of  errors  of  observation. 

7.  This  same  curve  serves  in  the  sequel  to  describe  a  certain 
important  type  of  statistical  distribution,  in  which  each  observation 
is  determined  by  a  multitude  of  so-called  chance  causes  puUing  this 
way  and  that,  so  that  it  is  impossible  to  foretell  what  the  resultant 
effect  will  be. 

8.  The  failure  of  the  normal  curve  to  describe  other  common  dis- 
tributions, especially  those  which  are  unsymmetrical  in  character, 
leads  to  the  development  of  skew  varieties  of  curves  which  will 
fit  them. 

9.  The  extent  of  connection  between  one  set  of  data  and  a  pos- 
sibly related  set  is  a  natural  subject  for  inquiry  giving  rise  to  the 
theory  of  correlation. 


CHAPTER  II 

MEASUREMENT,   VARIABLES,    AND   FREQUENCY   DISTRIBUTION 

Measurement.  There  are  two  fundamental  characteristics  which 
pertain  to  nearly  all  measurement :  it  is  (1)  relative  :  it  involves 
a  comparison  between  one  magnitude  and  another  of  the  same  kind, 
and  (2)  approximate  :  the  comparison  in  practice  cannot  be  made 
with  absolute  exactness. 

A  man's  height,  for  example,  is  stated  to  be  5  ft.  8|  in.,  but  this 
would  convey  Httle  to  one  who  did  not  know  how  long  a  foot  was 
and  how  long  an  inch  was.  The  first  step  in  the  measurement  is 
made  by  comparing  the  man's  length  with  a  certain  constant 
length  previously  agreed  upon  as  a  standard  or  unit,  namely,  a 
'  foot '  ;  he  is  placed  to  stand  up  against  a  scale  which  is  divided 
up  into  feet,  and  the  highest  point  of  his  head  is  seen  to  come 
somewhere  between  the  5  ft.  line  and  the  6  ft.  Hne  :  he  is  there- 
fore longer  than  five  of  these  units,  set  end  to  end,  but  not  so  long 
as  six  of  them.  To  carry  the  measurement  a  stage  further  a  smaller 
unit  has  to  be  introduced  ;  each  foot  length  of  the  scale  is  sub- 
divided into  twelve  equal  parts  called  inches,  and  the  top  of  the 
man's  head  is  found  to  come  somewhere  between  the  5  ft.  8  in. 
line  and  the  5  ft.  9  in.  line  :  he  is  therefore  over  5  ft.  8  in.,  but 
not  quite  5  ft.  9  in.  in  height.  For  the  next  stage  in  the  measure- 
ment each  inch  of  the  scale  has  to  be  further  subdivided  into  quarter- 
inches,  and  the  top  of  the  man's  head  is  found  to  come  somewhere 
between  the  5  ft.  8  in.  3  qu.  in.  line  and  the  5  ft.  9  in.  line  ;  more- 
over it  is  nearer,  let  us  suppose,  to  the  former  line  than  to  the  latter. 
In  this  case,  then,  we  say  that  the  man's  height  or  length  is  5  ft. 
8f  in.,  measured  to  the  nearest  quarter  inch. 

In  measurement  the  decimal  notation  has  very  obvious  advan- 
tages, because  each  unit  is  always  divided  into  ten  equal  parts  to 
get  the  next  smaller  unit.  Thus  a  weight  of  7  kilogr.  5  hectogr. 
3  decagr.  8  gr.  4  decigr.  3  centigr.  can  be  expressed  at  once  in 
grammes,  namely  753843  gr.  ;  hence  if  we  were  measuring  to  the 
nearest  decagramme,  the  result  would  be  expressed  as  764  decagr. ; 
to  the  nearest  decigramme,  it  would  be  75384  decigr.,  etc. 
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Similarly,  a  length  of  12  kilom.  7  metres  2  centim.  can  be  written 
12007-02  metres,  or,  in  kilometres,  12*00702  kilom.,  or,  to  the  nearest 
decametre,  1201  decam.,  and  so  on. 

The  mere  act  of  counting  things  of  a  like  kind  is,  in  a  sense, 
measurement  of  a  primitive  type,  one  thing  being  the  linit,  though 
the  measurement  may  in  many  such  cases  be  exact ;  for  example, 
we  may  count  the  number  of  persons  in  a  room  exactly.  Even  in 
this  type  of  case,  however,  the  counting  or  measuring  cannot 
always  be  done  accurately,  but  the  inaccuracy  arises  from  lack  of 
precision  and  uniformity  in  definition  rather  than  from  want  of 
power  in  the  measuring  instrument  itself  :  e.g.  in  determining  the 
population  of  a  city,  inaccuracies  may  arise  because  of  failure  to 
define  exactly  the  boundaries  of  the  city,  or  the  time  at  which  the 
census  is  to  be  taken,  or  how  to  deal  with  the  migration  of  the  in- 
habitants from  or  into  the  city,  and  with  births  and  deaths  during 
the  actual  time  of  numbering. 

Variables.  By  a  variable  is  meant  any  organ  or  character  which 
is  capable  of  variation  or  difference  in  size  or  kind.  The  difference 
may  be  measurable  as  in  the  case  of  head-length,  height,  tempera- 
ture, etc.,  or  not  directly  measurable  as  in  the  case  of  colour,  intelli- 
gence, occupation,  etc.  Further,  the  variation,  when  measurable,  may 
be  continuous,  or  it  may  take  place  only  by  integral  steps,  omitting 
intermediate  values :  population,  for  example,  can  never  go  up  or  down 
by  less  than  one,  but  if  temperature  is  to  change  from  60  degrees  to 
61  degrees  it  must  pass  continuously  through  every  intermediate 
state  of  temperature  between  60  degrees  and  61  degrees. 

In  dealing  with  a  measurable  variable  sometimes  we  are  inter- 
ested not  so  much  in  its  actual  value  at  a  particular  instant  as  in 
the  change  which  has  taken  place  in  its  value  during  some  specified 
interval,  but  to  gauge  fairly  the  amount  of  this  change  it  is  necessary 
to  measure  it  relative  to  the  original  value  of  the  variable.  For 
example,  if  we  are  told  that  the  wages  of  a  certain  person  have 
gone  up  during  the  year  to  the  extent  of  3d.  an  hour,  we  cannot 
say  whether  this  is  much  or  little  to  him  until  we  know  what  his 
wages  were  originally.  The  addition  would  be  relatively  much  less 
if  he  were  a  skilled  patternmaker  earning  Is.  6d.  an  hour  than  it 
would  be  if  he  were  a  chainmaker  earning  only  6d.  an  hour.*  This 
point  can  be  met  by  stating,  not  simply  the  change  in  the  value  of 
the  variable,  but  the  ratio  of  the  new  value  to  the  old.  For  instance, 
the  patternmaker  in  the  above  instance  has  had  his  wages  increased 

[*  Wages  to-day  are,  of  course,  much  higher — the  above  figures  are  only  hypothetical.] 
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in  the  ratio  of  Is.  9d.  to  Is.  6d.  It  is  important  to  notice  that 
this  form  of  measurement  is  quite  independent  of  the  particular 
units  used  ;  if  we  take  Id.  as  unit,  the  ratio=21/18=7/6,  and  if 
we  take  Is.  as  unit,  the  ratio=l|/IJ=7/6  just  as  before. 

There  are  other  ways  of  measuring  this  change  in  the  value  of 
a  variable.  One  of  the  commonest  is  to  express  it  as  a  percentage 
of  the  original  value  ;  thus  the  patternmaker's  increase  is  at  the 
rate  of  yVxlOO,  or  16f  per  cent.,  which  is  simply  the  ratio  of 
increase  in  wage  to  previous  wage  multipUed  by  100.  The  multiplier, 
lOG,  is  quite  an  arbitrary  factor,  but  it  has  obvious  advantages :  among 
others,  it  works  well  with  the  decimal  notation  and  it  often  serves 
to  put  the  result  into  a  form  which  is  greater  than  unity  instead  of 
leaving  it  as  a  fraction.  Again,  a  man  who  gets  a  dividend  of  £25 
on  an  investment  of  £500  receives  interest  at  the  rate  of  -^  X 100, 
or  5  per  cent.  ;  in  other  words,  this  is  the  rate  at  which  his  capital 
accumulates  if  the  interest  is  added  to  it  instead  of  being  spent. 

Annual  birth  rates  and  death  rates,  on  the  other  hand,  are  best 
expressed  per  thousand  of  the  population,  as  estimated,  say,  at 
the  middle  of  the  year  in  question  ;  e.g.  the  birth  rate  of  the  United 
Kingdom  in  1911  was  24-4  per  thousand,  and  the  death  rate  was 
14-8  per  thousand,  which  is  equivalent  to  244  and  148  per  10,000 
of  the  population  respectively.  If  we  could  assume  the  birth 
and  death  rates  to  remain  constant  from  year  to  year,  and  if  we 
could  afford  to  leave  migration  out  of  account,  the  population 
would  be  subject  to  exactly  the  same  law  of  increase  as  capital 
accumulating  at  compound  interest  [see  Appendix,  Note  1],  thus  : — 

1.  If  P  be  the  original  population,  and  if  the  annual  net  increase 
be  at  the  rate  of  25  per  thousand,  then 

the  population  in  I  year's  time=Px  (1*025) 

2  „  =Px  (1-025)2 

3  „  =Px  (1-025)3 
n          „  =Px  (1-025)". 

2.  If  £P  be  the  original  capital,  and  if  the  annual  increase  be  at 
the  rate  of  2 J  per  cent.,  then 

the  capital  in  1  year's  time==Px  (1-025) 
„         2  „  =Px  (1-025)2 

„         3  „  =Px  (1-025)3 

„     „         n  „  =Px  (1-025)". 

Lest  we  may  seem  to  have  laboured  to  make  plain  what  is  really 
a  simple  idea,  it  may  be  remarked  that  quite  frequently  confusion 
arises  with  regard  to  percentage  even  in  reputable  quarters.     As  an 
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illustration  of  the  kind  of  mistake  which,  without  thinking,  is  easily 
made,  the  following  argument  has  been  taken  from  a  monthly 
circular  sent  out  a  little  while  ago  to  the  members  of  the  Boiler- 
makers' Society  by  their  Secretary  :  Since  July  1914,  wages  have 
risen  15  jper  cent.,  the  cost  of  living  has  gone  wp  45  'per  cent.,  therefore 
the  workers'  real  wages  have  fallen  30  per  cent.  This  same  argument 
was  quoted  shortly  after  in  one  of  the  leading  articles  of  The  Man- 
chester Guardian  under  the  heading  '  Prices  and  Wages,'  and  again 
in  The  Labour  Leader  tersely  as  truth  '  In  a  Nutshell,'  but  in 
neither  instance  did  it  seem  to  have  occurred  to  the  writer  that  it 
was  inaccurate.  It  may  be  worth  while  for  the  sake  of  clearness  to 
show  what  the  statement  should  have  been  : — 


Wages. 

Cost  of 
Living. 

Ratio  of  Wages  to 
Cost  of  Living, 

Same  Ratio 
multiplied  by  100. 

July  1914  . 
October  1916      . 

100 
115 

100 
145 

1 

100 
79 

Since 


115 

14^ 


X 100  is  roughly  79,  this  calculation  shows  that  '  real 
wages  '  had  faUen  only  about  21  per  cent.  (100—79=21),  and  not 
30  per  cent,  as  stated,  between  the  two  dates. 

Index  Numbers.  A  very  important  case  of  variables  changing 
with  time  appears  in  the  discussion  of  changes  in  the  value  of 
money  as  measured  by  the  movement  of  prices  of  commodities, 
introducing  the  notion  of  an  index  number.  For  example,  supposing 
the  wholesale  price  of  beef  was  6d.  a  lb.  at  one  date,  8d.  a  lb.  at 
another  date,  and  5Jd.  a  lb.  at  a  third  date,  the  change  might  be 
exhibited  as  in  the  following  table  : — 


1st  Date. 

2nd  Date. 

3rd  Date. 

Price  of  beef 

6d. 
100 

Sd. 
133 

5K 
92 

Here  100,  133,  and  92  are  called  index  numbers,  the  price  at  the 
first  date  being  taken  as  a  standard  and  denoted  by  100,  while 
the  prices  at  the  other  two  dates  are  altered  proportionally,  so  that 

6:8:5J=100:I33:92. 

Index  numbers  calculated  on  this  principle  have  been  published 
systematically  for  several  years  by  Mr.  A.  Sauerbeck  (in  the  Journal 
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o/  the  Royal  Statistical  Society  up  to  January  1913,  and  continued 
afterwards  in  The  Statist  under  the  supervision  of  Sir  George  Paish) 
and  in  The  Economist. 

In  Sauerbeck's  index  numbers  the  average  wholesale  prices  of 
forty- five  commodities  for  the  eleven  years  1867-77  are  taken  as 
the  standard,  being  denoted  each  by  100  as  above,  and  the  prices 
of  the  same  commodities  for  any  other  year  are  then  written  as 
percentages  of  these  standard  prices.  The  commodities  chosen  are 
various — ^food  of  all  kinds  (cereals,  meat,  potatoes,  rice,  butter, 
sugar,  coffee,  tea),  minerals  (including  coal),  textiles,  and  sundries 
(including  hides,  leather,  tallow,  palm  oil,  olive  oil.  Unseed, 
petroleum,  soda,  soda  nitrate,  indigo,  timber).  Articles  of  similar 
character  are  grouped  together  ;  naturally  no  class  is  exhaustive, 
but  the  selection  is  a  fairly  representative  one.  A  sort  of  general 
average  is  then  formed  by  combining  all  the  results,  and  the  move- 
ment of  this  average  is  taken  to  measure  changes  in  the  value  of 
money.  An  example  will  make  clear  the  way  in  which  an  index 
number  for  each  group  and  the  general  average  are  obtained. 

The  index  number  for  each  separate  commodity  may  be  first 
calculated  thus  : — 


Price  of  English  Wheat. 

Years, 

QtX:       ;  Index  Number. 

1867-77 
1912  . 

s.     d. 

54     6                 100 

34    9                   64 

Now  forming  similar  index  numbers  for  each  of  the  eight  vegetable 
and  cereal  foods  and  combining  them  together,  we  have  : — 

Index  Numbers  for  Vegetable  and  Cereal  Foods. 


Years. 

1^ 

0) 

T 

1 

1 

i 

^1 

< 

1867-77      . 

100 

100 

100 

100 

100 

100 

100 

100 

800 

100 

1912. 

64 

68 

70 

79 

83 

85 

74 

101 

624 

78 

The  figures  in  the  last  column  but  one  are  obtained  by  simply 
adding  the  figures  in  the  eight  previous  columns,  and,  dividing  these 
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results  by  eight,  we  get  the  average  index  number  for  the  group 
in  1912  as  a  percentage  of  that  in  the  standard  years  1867-77. 

Treating  all  the  other  commodities  in  the  same  way  we  ultimately 
get  index  numbers  for  all  the  different  groups  and  for  all  com- 
modities combined  as  follows  : — 

Index  Numbers  for  different  Groups  and 
FOR  ALL  Commodities. 


No.  of  CoTnmodities 

8 

7 

4 

19 

7 

8 

11 

45 

Years. 

< 

6 

1 

in 

1 

X 

w 

1 

3 

SO 

OS 

S 

6 

1867-77       . 

1912  .... 

100 

78 

100 
96 

100 
62 

100 

81 

100 

no 

100 
76 

100 

82 

100 

85 

The  index  number  for  '  All  Food  '  is  obtained  by  summing  the 
nineteen  index  numbers  for  the  separate  commodities  which  are 
included  in  this  class  and  dividing  the  result  by  19.  Similarly  the 
general  index  number  for  all  commodities  is  obtained,  not  by 
adding  the  numbers  for  the  different  groups  and  dividing  by  the 
number  of  groups,  but  by  adding  the  forty-five  index  numbers  of 
ail  the  separate  commodities  and  dividing  the  result  by  45. 

In  The  Economist  the  average  prices  of  twenty-two  commodities 
for  the  years  1901-5  are  taken  as  the  standard,  being  denoted 
each  by  100,  and  the  prices  of  the  same  commodities  for  any  other 
year  are  then  written  as  percentages  of  these  standard  prices  ;  the 
sum  of  these  percentages  is  taken  as  the  index  number,  and  it  is 
a  simple  matter  to  divide  by  22  if  we  wish  to  get  the  average  per- 
centage change.  The  following  table  explains  the  method  of 
calculation  : — 

Index  Numbers  aHbESENTiNG  Prices  of  Commodities 


Date. 

Cereals 

and 
Meat. 

^^L    Textiles. 

Minerals. 

Miscel- 
laneous. 

Total. 

Index  No. 

22. 

1901-5  . 

End  of  Dec.  1916 

500     1     300         500 
1294         553       1124-5 

400 
824-5 

500 
1112 

2200 
4908 

100 
223 
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In  this  table  five  commodities  are  included  under  the  head  of 
*  Cereals  and  Meat,'  three  under  '  Other  Foods,'  and  so  on.  The 
numbers  in  the  last  column  are  obtained  by  dividing  those  in  the 
previous  column  by  22. 

It  is  clear  that  what  is  at  bottom  the  same  principle  may  be 
appHed  in  any  case  of  a  variable  changing  with  time  when  we  wish 
to  measure  the  extent  of  the  change,  so  that  the  use  of  index  numbers 
is  not  confined  to  the  problem  of  prices.  We  shall  return  again  to 
discuss  one  or  two  further  points  in  connection  with  the  same 
subject  in  the  Chapter  on  '  Averages.' 

Frequency  Distribution.  So  far  we  have  been  thinking  more 
particularly  of  the  change  which  an  individual  variable,  or  a  col- 
lection of  such  variables,  may  undergo  in  the  course  of  time,  or  the 
difference  between  two  values  which  the  same  variable  may  have 
at  two  different  instants  of  time,  and  how  to  measure  it.  Now 
the  science  of  Statistics  is  based  upon  the  study  of  the  crowd 
rather  than  of  the  individual,  although  observations  on  individuals 
have  to  be  made  before  they  can  be  combined  together  to  produce 
the  crowd,  just  as  individual  income-tax  schedules  have  to  be 
completed  and  combined  before  the  balance-sheet  of  the  State  can 
be  drawn  up.  As  we  pass  from  one  individual  to  another  there 
may  be  great  differences  in  the  organ  or  character  observed — hence 
the  word  variable  already  introduced — but  in  the  mass  these  differ- 
ences are  merged  together  and  lose  their  individual  importance  : 
it  is  rather  their  resultant  effect  we  seek  to  measure.  In  order 
therefore  to  discover  this  effect  it  is  necessary  to  make  a  collection 
of  individual  observations  and  to  analyse  the  results.  Now  if  our 
ultimate  conclusions  are  to  be  safe  the  number  of  observations 
must  be  considerable,  and  in  order  to  be  able  to  cope  with  them 
and  reduce  them  to  some  sort  of  system  the  first  step  in  the  analysis 
consists  in  arranging  them  in  different  classes  according  to  the 
value  of  the  variable  under  consideration. 

It  is  to  be  noted  that  now  we  are  ""frtpr  with  changes  in  the 
value  of  a  variable  as  we  pass  from  onRy^ff0>midual  to  another  at  the 
same  period  of  time  and  under  the  same  ^ifteral  conditions,  and  not 
with  the  change  in  a  variable  in  the  same  individual  occurring  with 
the  lapse  of  time.  We  wish,  for  example,  to  draw  a  distinction 
between  (1)  the  change  in  wages  as  we  pass  from  one  man  to  another 
at  the  same  time  in  the  same  trade,  and  (2)  the  change  in  wages  of 
the  same  man,  or  class  of  men,  in  the  same  trade  occurring  in  a 
given  period  of  time  ;  in  the  fii'st  case  we  want  to  find  the  amount 
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of  diversity  within  the  trade  at  some  stated  time,  and  in  the  second 
our  object  is  to  discover  whether  an  improvement  has  taken  place 
in  the  wages  of  a  particular  individual  or  a  particular  trade  with 
the  passage  of  time. 

In  picturing  variation  of  the  first  type  the  conception  arises  of  a 
frequency  distribution  where  the  observations  are  distributed  in 
ordered  groups,  with  a  number  corresponding  to  each  showing 
how  many,  or  how  frequent,  are  the  individuals  possessing  the  type 
of  variable  or  character  which  defines  that  group.  More  generally, 
if  a  series  of  measurements  or  observations  of  a  variable  y  are 
made  corresponding  to  a  selected  series  of  another  variable  x  we 
get  a  distribution,  which  becomes  a  frequency  distribution  when  y 
represents  the  frequency  of  events  happening  in  a  particular  way, 
or  of  individuals  corresponding  to  a  particular  value  of  some 
common  variable  or  character,  represented  by  x.  Thus  (1)  the 
boys  in  a  school  might  be  grouped  according  to  their  intelligence  : 
so  many,  dull ;  so  many,  of  ordinary  intelligence  ;  and  so  many, 
bright  or  above  the  ordinary.  Again  (2)  in  an  inquiry  into  the 
housing  of  the  people  in  any  town  or  district  it  would  be  necessary 
to  draw  up  a  table  showing  the  number  or  frequency  of  existing 
tenements  with  one  room,  the  frequency  of  tenements  with  two 
rooms,  the  frequency  of  tenements  with  three  rooms,  and  so  on. 
Once  more  (3)  a  zoologist,  wishing  to  discover  whether  crabs  of  a 
certain  species  caught  in  one  locality  differ  in  any  remarkable  way 
from  members  of  the  same  species  caught  in  another  locality,  might 
start  by  making  measurements  of  the  length  of  carapace  or  upper 
shell  for  crabs  of  like  sex  in  the  two  places  and  then  proceed  to 
form  frequency  tables  for  each,  setting  out  the  frequency  of  crabs 
for  which  the  carapace  length  lies,  say,  between  5  and  6  millimetres, 
the  frequency  with  length  between  6  and  7  millimetres,  the  frequency 
with  length  between  7  and  8  millimetres,  and  so  on.  He  would 
then  have  in  these  tables  some  basis  for  comparing  the  specimens 
caught  in  the  two  locaHties. 

The  three  illustrations  just  used  give  three  different  types  of 
distribution  corresponding  to  the  three  types  of  variable  to  which 
attention  has  been  drawn  before.  In  the  first,  where  the  variable 
or  character  observed  is  not  measurable,  doubt  will  sometimes 
arise  as  to  the  appropriate  class  in  which  individuals  should  be 
placed  who  seem  to  be  on  the  border  line  between  dulness  and 
mediocrity  or  between  mediocrity  and  brilliance,  so  that  accurate 
classification  will  greatly  depend  upon  what  is  called  the  '  personal 
equation  '   of  the  observer.     The  second  illustration  corresponds 
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to  the  case  where  the  variable  changes  not  continuously  but  by 
unit  stages ;  the  choice  of  classes  in  such  a  case  depends  little 
upon  the  observer  unless  the  unit  is  very  small  compared  to  the 
total  range  of  variabiHty;  for  example,  a  tenement  might  either 
definitely  have  two  rooms  or  it  might  have  three  rooms,  but  it 
clearly  could  not  be  put  down  as  having  2J  rooms  or  2^  rooms  : 
in  other  words,  the  only  natural  classification  is  so  many  tenements 
with  two  rooms,  so  many  with  three  rooms,  so  many  with  four 
rooms,  and  so  on,  though  here  too  some  confusion  might  arise 
through  failure  to  define  clearly  what  is  '  a  room.'  In  the  third 
tjrpe,  where  we  can  conceive  of  the  continuous  variation  of  the 
character  under  observation,  there  would  be  nothing  surprising  in 
the  appearance  of  any  value  of  the  variable  between  the  lowest 
and  highest  values  observed  ;  the  choice  of  suitable  limits  for  the 
several  groups  becomes  therefore  in  this  case  rather  a  delicate 
matter  which  requires  careful  judgment. 

We  shall  begin  the  next  chapter  with  some  general  remarks 
upon  the  subject  of  classification  and  tabulation. 


CHAPTER  III 

CLASSIFICATION    AND   TABULATION 

No  part  of  Statistics  is  of  more  importance  than  that  which  deals 
with  classification  and  tabulation,  and  it  is  the  one  part  for  which 
no  very  precise  rules  can  be  given.  A  neat  arrangement  of  ideas 
in  the  mind,  capacity  to  express  them  clearly,  and  patience  are 
indispensable,  but  experience  alone  will  convince  one  of  the  extreme 
care  which  must  be  exercised  if  blunders  are  to  be  avoided  and 
time  is  to  be  saved  in  the  long  run.  This  has  to  be  emphasized 
because  most  people,  until  they  have  tried  and  failed,  imagine 
that  to  arrange  things  in  classes  and  in  tables  is  a  straightforward 
proceeding  involving  no  great  thought  or  trouble. 

Abundant  matter  of  a  statistical  character  is  published  periodi- 
cally in  Blue-books,  Government  Reports,  Reports  of  Local  Authori- 
ties, Directors  of  Education,  Medical  Officers  of  Health,  Chief 
Constables,  Employers'  Associations,  Trade  Unions,  Co-operative 
Societies,  etc.,  but  it  needs  a  trained  intelligence  as  a  rule  to  assimi- 
late it  and  turn  it  to  further  advantage.  The  larger  the  scale  upon 
which  any  inquiry  is  made,  the  more  valuable  should  the  results 
be,  granted  that  equal  accuracy  is  possible  on  the  large  as  on  the 
small  scale,  but  it  is  fairly  clear  that  mistakes  of  various  kinds 
have  also  much  more  chance  of  creeping  into  a  large  work  than  into 
a  small  one.  To  appreciate  the  various  and  numerous' possibilities 
of  error  when  the  scope  is  wide  it  is  enough  to  read  the  introduc- 
tions to  the  Registrar-General's  Reports  on  the  Census  from  decade 
to  decade  ;  this  should  also  impress  the  student  with  the  care  that 
is  necessary  if  he  proposes  to  use  such  material  for  the  investigation 
of  some  other  problem.  It  may  seem  a  comparatively  simple  task 
to  abstract  two  sets  of  figures  from  a  Census  Report,  to  establish 
a  one-to-one  correspondence  between  them,  and  to  make  deductions 
therefrom,  but  such  figures  when  taken  from  their  context  will 
sometimes  lead  to  absolutely  unsafe,- if  not  false,  conclusions.  The 
exact  meaning  and  limitations  of  any  data  can  only  be  properly 
appreciated  by  one  who  has  been  closely  in  touch  with  the  persons 
who  have  collected  them,  and  it  is  therefore  important,  before 
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attempting  to  re- classify  or  re- tabulate  any  old  statistics  for  a  new 
purpose,  to  read  very  carefully  through  the  notes  made  by  the 
original  compilers. 

Perhaps  the  best  advice  that  can  be  given  to  any  one  in  this 
connection  is  that  he  should  embark  upon  some  small  inquiry 
which  will  necessitate  the  collection  of  statistics  for  himself ;  the 
final  result  of  his  efforts  may  seem  disappointing,  but  the  experi- 
ence he  will  gain  will  be  invaluable.  Ideas  for  such  an  inquiry  will 
occur  to  him  if  he  reads  through  some  authoritative  work  on  social 
questions,  e.g.  Beveridge's  Unemployment,  the  decennial  Census 
Reports,  or  The  Minority  Report  on  the  Poor  Law  (1905).  But  he 
must  read  with  an  open  and  critical  mind,  questioning  particularly 
the  foundation  for  all  statements  as  to  cause  and  effect  which  may 
be  made.  A  few  simple  hints  may  be  useful  as  to  method  of 
procedure. 

When  he  thinks  he  has  discovered  some  subject  of  interest  which 
would  appear  to  deserve  examination,  it  wiU  be  well  to  put  it 
down  on  paper  in  order  to  get  it  clearly  defined,  because  a  precise 
written  statement  is  likely  to  carry  one  further  than  a  shadowy 
idea  somewhere  at  the  back  of  the  mind  which  is  hardly  formu- 
lated at  all.  When  the  actual  collection  of  statistics  is  begun 
it  will  almost  certainly  be  found  that  it  is  impossible  to  solve  the 
original  problem  contemplated  ;  but  that  need  not  prevent  further 
progress — what  is  important  is  that  the  limitations  should  be 
exactly  realized,  and  this  will  be  impossible  unless  the  original 
problem  is  clearly  presented  side  by  side  with  the  nearest  solution 
obtainable. 

The  problem  stated,  the  next  thing  is  to  set  down  categorically 
a  number  of  questions,  the  answers  to  which  are  to  be  the  raw 
material  for  the  solution  of  the  given  problem.  For  the  answers 
let  us  assume  the  inquirer  is  dependent  upon  the  goodwill  of  others, 
either  employers,  or  trade  union  secretaries,  or  public  officials. 
The  questions  in  that  case  must  be  clearly,  concisely,  and  courteously 
phrased,  and  must  not  be  capable  of  more  than  one  interpretation. 
In  number  they  should  be  few  and  in  character  not  inquisitorial ; 
moreover,  the  replies  should  be  obtainable  without  any  great  labour 
on  the  part  of  the  persons  approached.  Here  again  it  will  be  fou^nd 
that  the  questions  first  set  down  are  not  all  satisfactory  :  one  will 
be  too  vague  ;  another,  though  clear  enough,  may  involve  a  con- 
siderable search  through  a  mass  of  other  matter  before  it  can  be 
properly  answered  ;  while  to  another  it  might  be  impossible  to  give 
an  exact  reply  in  any  case.     Revision  and  amendment  may  there- 
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fore  be  necessary  in  the  light  of  the  first  replies  received,  and  the 
inquirer  will  begin  to  see  at  this  stage  how  far  the  solution  to  his 
original  problem  is  reaUy  possible. 

When  the  bulk  of  the  returns  have  come  in  they  should  be  critically 
examined  one  by  one.  A  number  will,  for  one  reason  or  another, 
be  worthless,  and  they  must  be  discarded  ;  as  for  the  remainder, 
if  the  questions  were  well  chosen,  the  answers  should  not  be  difficult 
to  interpret  and  classify  ;  the  most  successful  questions  are  those 
to  which  a  simple  '  yes  '  or  '  no  '  in  reply  gives  all  the  information 
required  ;  numerical  answers  are  less  easy  to  deal  with,  especially 
if  there  is  the  least  chance  of  misunderstanding  on  either  side  as 
there  often  is,  for  example,  in  the  case  of  observations  which  are 
on  the  border  line  between  two  classes. 

Tables  should  then  be  drawn  up  and  the  headings  to  the  different 
columns  of  the  tables  should  state  concisely  and  exactly  what  the 
figures  below  represent.  So  far  as  possible  any  one  should  be  able 
readily  to  grasp  their  general  meaning  without  being  obliged  to 
wade  through  a  page  or  two  of  written  explanation  ;  if  any  heading 
cannot  be  clearly  expressed  in  a  few  words  it  may  be  helped  out 
by  a  further  note  at  the  bottom  of  the  page,  but  too  many  such 
notes  are  to  be  avoided. 

Finally,  a  summary  should  be  made  of  the  various  conclusions 
suggested  by  a  study  of  the  tables.  Some  of  the  points  raised  in 
the  course  of  the  inquiry  will  perhaps  be  only  incidental  to  the 
main  problem  under  discussion,  but  may  still  deserve  a  passing 
reference.  It  will  also  be  of  advantage  to  foUow  up  the  summary 
by  any  recommendations  which  can  be  fairly  based  on  the  con- 
clusions obtained,  when  the  problem  is  such  that  recommendations 
are  expedient,  and,  if  ultimately  the  whole  is  of  sufficient  value  to 
be  printed,  emphasis  can  be  introduced  where  necessary  by  suitable 
variations  in  t3rpe. 

For  this  part  of  the  work  considerable  judgment  is  necessary 
which  can  only  be  acquired  by  long  training — a  faculty  to  pick  out 
the  real  from  the  false  and  an  eye  to  distinguish  the  important  from 
the  trivial.  A  sense  of  numerical  proportion  too  is  desirable  inci- 
dentally ;  one  of  our  leading  exponents  on  finance  in  a  book  dealing 
with  the  meaning  of  money  uses  a  very  interesting  illustration  which 
is  perhaps  worth  quoting  here  to  show  how  even  an  acute  mind 
may  on  occasion  prove  itself  curiously  lacking  in  such  a  sense. 
He  is  seeking  to  show  how  the  credit  system  of  the  country  is  built 
upon  a  foundation  composed  of  a  little  gold  and  a  lot  of  paper  ; 
for  this  purpose  he  amalgamates  together  the  balance-sheets  of  half 
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a  dozen  big  banks,  and  proves  that  their  habilities  on  current  and 
deposit  account  amounted  at  a  certain  date  prior  to  1914  to  249 
million  pounds,  while  the  cash  in  hand  and  at  the  Bank  of  England 
was  43  millions.  Of  the  43  millions  he  estimates  that  roughly 
20  millions  would  be  cash  in  the  Bank  of  England,  and  further 
that  about  two-thirds  of  this  20  millions  would  be  represented  really 
by  securities  and  not  by  gold.  Hence  he  concludes  that  to  support 
this  vast  erection  of  credit  there  would  only  be  £6,666,666  of  actual 
gold.  Thus  after  talking  throughout  in  millions  the  author  closes 
by  giving  his  i-esult  true  apparently  to  a  pound  ! 

Much  may  be  learnt  as  to  methods  of  classification  and  the 
drawing  up  of  tables  by  a  careful  study  of  those  which  appear  in 
various  official  reports,  and  a  few  such  tables  are  reproduced  in 
the  pages  which  follow. 


Table  (1).  Condition  as  to  Cleanliness  of 
School  Children  in  Surrey. 


Cleanliness. 

5  years,  1908-12.    79,070  children  inspected. 

Above  the  average   . 
Average 
Below  average 
Much  below  average 

15-4  per  cent. 
76-5 

7-6 

0-5         „ 

Table  (2).  Condition  as  to  Infectioijs  Diseases  of 
School  Children  at  Different  Ages  in  Surrey  (1913). 


Age  Groups  inspected 

5-6 

8-9 

13-14 

Total  at 
All  Ages. 

Numbers  inspected 

5,191 

5,151 

4,962 

15,304 

Proportion  who  before  inspe 

c- 

tion  had  suffered  from — 

per  cent. 

per  cent. 

per  cent. 

per  cent. 

Diphtheria  . 

1-3 

3-5 

5-4 

3-4 

Scarlet  fever 

2-7 

7-2 

10-9 

6-9 

Measles 

55-3 

79-3 

84-6 

72-9 

Whooping  cough 

41-8 

56-4 

54-3       ! 

50-9 

German  measles 

1        2-9 

51 

7-5  • 

51 

Chicken  pox 

1       261 

401 

38-6 

34-9 

Mumps 

i       10-6 

220 

29-8 

20-7 

No  infectious  diseases 

18-9 

61 

4-7 

100 

No  definite  information 

3-3 

2-2 

0-9 

2-2 
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Table  (3).  Height  op  School  Children  according  to 
District,  Age,  and  Sex  (1913). 


Age 
'  Groups. 

Boys. 

Girls. 

1 

Nos. 
measured. 

1    " 

Average 

Height 

in  inches. 

Average  Height 
in  cms. 

Nos. 
measured. 

2467 
2573 
2433 

Average 

Height 

in  inches. 

Average  Height 
in  cms. 

Surrey. 

England 

and 
Wales. 

Surrey. 

England 

and 
Wales. 

102-6 
119-4 
144-2 

5-6 

8-9 

13-14 

2724 
2578 
2529 

41-4 

47-8 
57-0 

105-2 
121-4 
144-8 

103-4 
120-4 
142-4 

41-3 
47-5 
57-9 

104-9 
120-7 
1471 

The  first  four  are  taken  from  the  Annual  Report  of  the  School 
Medical  Officer  for  the  County  of  Surrey,  1913.  The  first  is  an 
example  of  single  tabulation  showing  the  distribution  according  to 
cleanhness  of  children  inspected  in  the  elementary  schools.  The 
second  is  an  example  of  double  tabulation,  showing  the  distribu- 
tion according  to  age  of  school  children  who  at  some  period  before 
the  date  of  inspection  had  suffered  from  certain  infectious  diseases. 
The  third  is  an  example  of  quadruple  tabulation,  showing  the  dis- 
tribution of  school  children  according  to  height,  district,  sex,  and 
age.  Thus  in  the  first  case  we  have  one  factor  brought  into  relief, 
viz.  cleanliness  ;  in  the  second  case  we  have  two  factors,  age  and 
disease  ;  in  the  third  case  we  have  four  factors,  height,  district, 
sex,  and  age. 

When  we  have  two  or  more  factors  tabulated  together  as  in  cases 
(2)  and  (3),  we  may  be  sometimes  led  to  discover  a  connection  of 
some  kind,  possibly  causal,  between  them,  and  the  search  for  such 
a  connection,  or  correlation  as  it  is  called,  represents  one  very  useful 
purpose  to  which  tabulation  may  be  put.  Table  (4)  is  an  illustra- 
tion of  this.  It  is  the  result  of  certain  measurements  carried  out  in 
order  to  discover  the  effect  of  employment  out  of  school  hours  upon 
the  physical  condition  of  boys.  The  particular  factor  examined  as 
the  possible  cause  of  evil  in  this  connection  is  lack  of  sleep,  and 
the  figures  given  certainly  seem  to  warrant  a  closer  examination 
into  the  matter. 
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Table  (4).  Physical  Condition  of  certain  Boys  according 
TO  Hours  op  Sleep  Obtained. 


No.  of  Hours 
Sleep  obtained. 

No.  of  Boys 
examined. 

Average 

Height  in 

inches. 

Average 

Weight  in 

lbs. 

Nutrition. 

Percentage 

above 

average. 

Percentage 
average. 

Percentage 

below 

average. 

7  to  8      . 

8  to  9      . 

9  to  10    . 

10  to  11    . 

11  to  12    . 

14 

80 
296 
280 

50 

54-5 
55-4 
56-4 
57-9 
59.0 

71-3 
73-9 
79-3 

83-2 
87-0 

71 
101 
15-3 

22-8 
220 

35-8 
65-9 
64-5 
66-5 
680 

571 
240 
20-2 
10-7 
100 

Tables  (5)  and  (6)  are  two  illustrations  of  neat  tables,  containing 
a  large  amount  of  information  in  a  small  space,  set  out  in  such  a 
form  that  the  eye  can  easily  take  it  in — and  that  is  the  main  purpose 
of  tabulation.  These  examples  are  selected  from  the  Sixteenth 
Abstract  of  Labour  Statistics  of  the  United  Kingdom,  Cd.  7131. 

In  Table  (6)  note  the  classification  of  age  groups  :  it  is  not  '  5  to 
10  years,'  '  10  to  15  years,'  and  so  on,  but  '  5  and  under  10  years,' 
'  10  and  under  15  years,'  and  so  on.  This  removes  difficulties  at 
the  border  lines  between  two  classes  ;  the  difficulties  are  not  com- 
pletely removed,  however,  unless  there  is  some  understanding  as 
to  what  shall  constitute  under  any  particular  age.  Shall  it  be  six 
months  under,  or  one  day  under,  or  one  hour  under  ?  This  sort 
of  ambiguity  has  more  importance  in  some  cases  than  in  others. 
Suppose,  for  example,  we  were  classifying  men  according  to  their 
height :  a  group  of  the  type  '  60  inches  and  under  62  inches,' 
assuming  that  measurements  were  made  to  the  nearest  half-inch, 
would  really  include  all  men  who  were  '  59J  inches  and  under 
61 1  inches  '  ;  because  one  who  measured  anything  from  59f  in. 
to  60i  in.,  being  nearer  to  60  in.  than  to  59 J  in.  measuring  to 
the  nearest  half -inch,  would  be  registered  as  60  in.  in  height,  while 
one  who  measured  anything  from  61f  in.  to  62J  in.,  being  nearer 
to  62  in.  than  to  61J  in.,  would  be  registered  as  62  in.  in  height. 

Another  point  to  be  noted  is  that  in  general  people  making 
returns  seem  to  have  a  psychological  weakness  for  round  figures, 
so  that  a  man  in  the  neighbourhood  of  40  years  of  age,  for  example, 
is  apt  to  record  himself  as  actually  40  although  he  may  really 


20 


STATISTICS 


Table  (5).  Classification  of  Overcrowded  Tenements — * 
England  and  Wales  (1911). 


Urban  Districts. 

Rural  Districts. 

Total. 

Occupants 
thereof. 

Occupants 
thereof. 

Occupants 
thereof. 

Tenements 

WITH 

No.  of 
Over- 
crowded 
Tene- 
ments. 

No.  of 
Over- 
crowded 
Tene- 
ments. 

No.  of 

Over- 

,  crowded 

1     Tene- 

i    ments. 

No. 

Per- 
cent- 
age of 
total 

No. 

Per- 
cent- 
age of 
total 

No. 

Per- 
cent- 
age of 
total 

popu- 
lation. 

popu- 
lation. 

popu- 
lation. 

1  room     . 

2  rooms  . 

3  rooms  . 

4  rooms  . 

56,290 
119,695 
107,892 

64,470 

206,022 
712,613 
847,937 
624,747 

0-7 
2-5 
3-0 

2-2 

1,545 
15,397 

22,380 
17,341 

5,748 

91,458 

175,988 

167,969 

01 
1-2 
2-2 

2-1 

57,835 
i  135,092 
i  130,272 
'    81,811 

211,770 

804,071 

1,023,925 

792,716 

0-6 

2-2 
2-8 
2-2 

5  or  more 
rooms  . 

21,200 

251,405 

0-9 

4,700 

55,585 

0-7 

25,900 

306,990 

0-8 

Table  (6).  Population  grouped  according  to  Age- 
England  AND  Wales  (1911). 


males. 


Age  Groups. 

Urban  Districts. 

Rural  Districts. 

] 
All  Districts. 

] 

Number. 

Percentage. 

Number. 

Percentage. 

Number. 

Percentage. 

Under  5  years 

1,517,432 

11-3) 

418,681 

10-6^ 

1,936,113 

111^ 

5  and  under  10  years 

1,431,900 

'nw^ 

415,395 

10-5  Li.o 
10-3  P^  -^ 

1,847,295 

^^•*H41-2 

100  p^  ^ 

10        „          15     „ 

1,341,586 

406,045 

1,747,631 

15        ,,         20     „ 

1,267,500 

94 

387,395 

9-8j 

1,654,895 

9-5J 

20        „         .30     „ 

1    2,332,135 

17-3^ 

626,300 

15-9^ 

2,958,435 

17-0^ 

30        „          40     „ 

2,094,934 

15-5  144-4 

542,370 

13-7  Uo-9 

2,637,304 

15-1  U3-6    -- 

40        ,,         50     „ 

!    1,556,818 

11-6 

444,360 

II.3J 

2,001,178 

11.5/ 

50        „          60     „ 

i    1,042,868 

7-7^ 

333,368 

84) 

1,376,236 

7-9) 

60        „         70     „ 

1       612,741 

4-5  V144 

230,306 

5-8  V17-9 

843,047 

4-8  V 15-2 

70  and  upwards 

1       296,246 

2-2j 

147,228 

3-7j 

443,474 

2-5  J 

Total 

13,494,160 

100-0 

3,951,448 

100-0 

17,445,608 

100-0 

[*  For  the  purpose  of  the  Census  Reports  'ordinary  tenements  which  have  more 
than  two  occupants  per  room,  bedrooms  and  sittinjj-iooms  included,'  are  considered 
overcrowded.] 
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be  39  or  41  years  old.  To  diminish  the  error  arising  from  this  fact 
it  is  usual,  when  not  otherwise  inconvenient,  to  fix  the  centres 
of  the  class-intervals  at  round  figures  :  e.g.  to  take  *  15  and  under 
25  years,'  '  25  and  under  35  years,'  etc.,  in  preference  to  '  20  and 
under  30  years,'  '  30  and  under  40  years,'  etc.  Where  there  is 
any  known  bias  in  the  data,  as,  for  instance,  in  the  famihar  case 
of  certain  women  who  consistently  register  themselves  as  younger 
than  they  really  are,  a  correction  can  be  made  in  the  final  figures. 

In  any  frequency  distribution  where  we  wish  to  group  a  number 
of  observations  according  to  the  magnitude  of  some  common 
variable,  as  in  Table  (6)  a  number  of  males  grouped  according  to 
age,  the  question  arises — '  How  many  groups  should  there  be  ?  ' 
With  this  question  is  involved  also  the  size  of  the  corresponding 
class-interval,  and  this  should  be  so  large  that,  with  possible  excep- 
tions at  either  extremity  of  the  table,  there  are  a  fair  proportion  of 
observations  to  each  class  or  group  ;  and,  contrariwise,  it  should 
be  so  small  that  all  the  observations  in  any  one  group  may  be 
treated  practically  as  if  they  were  located  at  the  centre  of  the  group 
so  far  as  the  variable  in  question  is  concerned,  e.g.  it  should  be 
possible  to  treat  males  recorded  in  class  '  50  and  under  60  years,' 
where  the  interval  is  10  years,  as  if  they  were  all  of  age  55  years.  It 
will  be  found  in  general  that  a  number  of  groups  somewhere  in  the 
neighbourhood  of  20  is  the  most  satisfactory,  granted  that  the 
number  of  observations  is  reasonably  large,  although  in  some  cases 
it  is  impossible  to  spHt  up  the  unit  of  class-interval,  and  we  are 
obliged  to  be  satisfied  with  a  smaller  number  of  groups  on  this 
account :  Table  (5)  is  a  case  in  point  where  we  are  tied  down  to 
one  room  as  the  class -interval.  In  Table  (6)  the  class-interval 
varies,  being  only  5  years  at  first,  and  afterwards  10  years,  but 
as  a  rule  the  labour  of  calculation  of  the  different  statistical  constants 
we  require  is  considerably  simplified  if  it  is  possible  to  keep  the 
size  of  the  class-interval  the  same  for  each  group. 


CHAPTER   IV 

AVERAGES 

Common  Average  or  Arithmetic  Mean.  Let  us  consider  one  of  the 
commonest  meanings  of  the  term  average.  If  a  train  travels  a 
distance  of  180  miles  in  3  hours  we  say  that  it  has  been  moving 
at  60  miles  an  hour.  By  this  we  do  not  mean  that  its  speed  is 
always  60  m/h,  never  more,  never  less,  but  that  if  it  had  moved 
always  at  that  uniform  speed  it  would  have  accomplished  its 
journey  in  exactly  the  same  time.  As  a  matter  of  fact,  during 
some  instants  it  may  have  been  moving  at  a  much  slower  rate 
than  60  m/h,  but,  if  so,  it  must  have  made  up  for  this  slackness 
by  travelling  at  a  much  faster  rate  than  60  m/h  during  other 
instants,  so  that  on  the  whole  a  balance  was  effected,  and,  as  we 
say,  the  speed  averaged  out  at  60  m/h. 

Again,  suppose  the  wages  of  three  men  are  :  A,  27s.  a  week  ; 
B,  18s.  a  week  ;  C,  30s.  a  week.  We  should  say  that  the  average 
wage  of  the  three  was  equivalent  to 

J(27+18+30)s.=25s.  a  week. 

In  other  words,  if  A,  B,  and  C  were  all  under  the  same  employer, 
and  if,  instead  of  paying  them  different  amounts,  he  wanted  to 
pay  them  all  equally,  he  would  have  to  give  each  man  25s.  a  week, 
assuming  that  his  total  wages  bill  was  to  remain  unaltered.  This 
method  of  measurement  gives  what  is  known  as  the  arithmetic 
mean,  or,  more  simply,  the  mean. 

Once  more,  in  discussing  the  state  of  the  labour  market  as  regards 
different  trades,  when  we  wish  to  compare  one  with  another,  it  is 
not  the  actual  numbers  unemployed  in  each  trade  that  are  quoted, 
but  these  numbers  expressed  as  percentages  of  the  total  numbers 
employable  in  each  trade. 

In  each  of  these  three  cases  we  reduce  our  observations  or 
measurements  to  a  sort  of  common  denominator,  so  that  they  may  be 
mentally  compared  or  contrasted  more  readily  with  other  observa- 
tions of  a  similar  character.     Thus  we  have  in  mind  a  certain  mean 
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train  speed  per  hour,  or  mean  wage  per  week,  or  moan  percentage 
out  of  work;  as  the  case  may  be. 

An  average  then  in  general  we  may  regard  as  one  of  a  class 
of  statistical  constants  (others  of  which  we  are  to  meet  later)  which 
concisely  label  a  set  of  observations  or  measurements  pertaining 
to  a  common  family.  It  is  designed  to  describe  the  family  type 
more  nearly  than  is  possible  by  observing  any  chance  member,  and  in 
value  it  should  therefore  come  somewhere  near  the  middle  of  the 
family  group,  so  that  if  the  individual  members  of  the  family 
chance  to  be  equal  each  to  each  in  respect  to  the  organ  or  character 
observed  it  should  have  the  same  value  as  they  have.  This  consti- 
tutes a  test  for  the  validity  of  any  formula  giving  the  average  of  a 
set  of  observations  :  e.g.  we  might,  if  we  wish,  define  the  average 
of  three  numbers,  p,  q,  r  to  be,  not  J(i?+g+r)  but 

for  (1)  this  formula,  too,  can  be  shown  to  give  a  number  intermediate 
in  value  between  the  greatest  and  least  of  the  numbers  j),  q,  r  ; 
also  (2)  if  we  put  p=q=r=k  (say),  the  formula  reduces  to 

l/i{J(^+k^-i-J(^)=  X/k^=k. 

Clearly  the  range  of  choice  for  the  definition  of  an  average  is 
infinite,  though  only  a  few  definitions  give  averages  which  have 
proved  their  utility  and  come  into  general  use.  Of  these  the  most 
important  is  the  common  mean  already  introduced,  with  its  ex- 
tension, the  weighted  mean,  but  at  least  two  others  deserve  special 
consideration,  the  median  and  the  mode. 

Median.  In  any  observed  distribution  if  aU  the  individuals 
can  be  arranged  in  order  of  magnitude  of  the  character  or  organ 
observed,  which  may  be  conveniently  done  when  they  are  not  very 
numerous,  the  median  organ  or  character  will  be  that  pertaining  to 
the  individual  half-way  along  the  series,  so  that  there  are  in  general 
an  equal  number  of  individuals  above  and  below  the  median. 
For  instance,  if  seven  boys  of  different  heights  be  placed  to  stand  in 
a  row,  the  tallest  first,  the  next  tallest  next,  and  so  on,  the  median 
height  is  the  height  of  the  fourth  boy  from  either  end.  If  there 
are  an  even  number  of  boys,  say  eight,  it  would  be  natural  to  take 
as  median  the  height  midway  between  that  of  the  fourth  and  that 
of  the  fifth  boy. 

When  the  items  are  numerous  they  are  frequently  grouped  into 
classes,  as  we  have  seen,  such  that  all  in  the  same  class  are  reckoned 
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to  have  some  value  lying  between  the  extreme  limits  of  that  class. 
We  should  then,  as  before,  halve  the  total  number  of  observations 
to  fix  the  i^articular  individual  which  defines  the  median  organ  or 
character.  This  would  enable  us  to  pick  out  the  group  in  which 
the  median  lies,  and  on  reference  to  the  original  record  of  observa- 
tions, assuming  it  was  at  hand,  it  would  be  a  simple  matter  to 
identify  the  median. 

If  the  original  record  be  not  available,  however,  it  will  be  neces- 
sary to  proceed  to  get  the  best  value  we  can  for  the  median  in  some 
other  way.  Consider,  for  example.  Table  (7),  showing  the  distribu- 
tion of  marks  obtained  by  514  candidates  in  a  certain  examination. 
We  begin  by  rearranging  the  data  in  the  manner  shown  below. 
Table  (7).  Now  in  accordance  with  the  definition  the  median  in 
marks  should,  strictly  speaking,  be  midway  between  the  marks 
assigned  to  the  257th  candidate  and  the  marks  assigned  to  the 
258th  candidate  :  in  fact,  the  marks  corresponding  to  candidate 
number  257-5,  if  it  were  possible  for  such  a  candidate  to  exist. 
But  we  are  ignorant  so  far  as  Table  (7)  goes  of  the  marks  gained 
by  either  the  257th  or  the  258th  candidate,  though  it  is  possible, 
by  the  simple  proportional  process  known  as  '  interpolation,'  to 
calculate  approximately  the  marks  we  require.  We  think  of  all 
the  candidates  as  forming  an  ordered  sequence,  ranged  one  after 
the  other  according  to  their  marks  just  like  the  boys  of  different 
heights,  and  the  table  shows  that  in  this  mental  picture 

the  231st  candidate  gets  approximately  30  marks,  while 
„    318th         „  „  „  35 

Hence  candidate  number  257-5,  if  one  existed,  ought  to  get  a 
number  of  marks  somewhere  between  30  and  35.  But,  in  this 
neighbourhood  of  the  sequence, 

a  difference  of  (318-231)  candidates  corresponds  to  a  difference 

of  5  marks,  therefore 
a  difference  of  (257'5-23l)  candidates  corresponds  to  a  difference 

of  (mtX26-5)  marks. 

Thus  the  marks  obtained  by  candidate  number  2575  are  ap- 
proximately =  30+ ^T  X  26-5 

=31-523, 

and  this  may  be  taken  as  the  median. 

On  examining  the  actual  marks-sheet  it  was  found  that  252 
candidates  obtained  31  marks  or  less,  and  273  candidates  obtained 
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32  marks  or  leas,  so  that  the  real  median  was  32,  because  this  was 
the  number  of  marks  gained  by  both  the  257th  and  the  258th 
candidates.  The  number  31-523  found  above,  however,  would  be 
a  good  approximation  to  take  for  the  median  when  all  the  informa- 
tion at  our  disposal  was  that  shown  in  Table  (7). 


Table  (7).  Marks  obtained  by  514  Candidates  in  a 
CERTAIN  Examination. 


Marks  Obtained. 

No.  of 
Candidates. 

Marks  Obtained. 

No.  of 
Candidates. 

lto5 
6  to  10 
11  to  15 
16  to  20 
21  to  25 
26  to  30 
31  to  35 

5 
9 

28- 

49 

58 

82 

87 

36  to  40 
41  to  45 
46  to  50 
51  to  55 
56  to  60 
61  to  65 

79 

50 

37 

21 

6 

3 

■ 

Total 

514 

1 

The  table  is  to  be  read  as  follows  : — 

5  candidates  obtained  1,  2,  3,  4,  or  5  marks, 
9  „  „        6,  7,  8,  9,  or  10      „ 


and  so  on. 


By  straightforward  addition  it  can  evidently  be  rearranged  so 
as  to  read  thus  : — 


5  candidates  obtained  not  more  than    5  marks. 


14 
42 
91 
149 
231 
31,8 
397 
447 
484 
505 
511 
514 


10 
15 
20 
25 
30 
35 
40 
45 
50 
65 
60 
65 
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It  will  be  noted  that  in  calculating  the  median  no  use  is  made  of 
the  marks  of  any  of  the  candidates  except  those  in  the  two  groups 
in  the  immediate  neighbourhood  of  the  median,  and  it  is  one  of 
the  great  advantages  of  this  average  that  it  can  be  found  when  an 
exact  knowledge  of  the  characters  of  the  more  extreme  individuals 
in  the  series  is  not  in  our  possession,  and  even  when  their  measure- 
ment is  impossible  :  it  is  enough  if  they  can  be  roughly  located. 
The  arithmetic  mean  on  the  other  hand  is  often  unduly  influenced 
by  abnormal  individuals  which  are  not  really  typical  of  the  popula- 
tion in  which  they  appear. 

Mode.  If  we  measure  or  observe  some  organ  or  character  for 
each  individual  in  a  given  population,  the  mode,  as  its  name  sug- 
gests, is  simply  the  organ  or  character  of  most  fashionable  or  most 
frequent  size.  A  large  draper,  for  example,  will  have  collars  of 
several  different  shapes  and  sizes  in  his  shop,  but  the  fashionable 
shape  and  the  predominant  size  correspond  to  the  mode  :  it  is  the 
mode  that  sells  most  readily,  and  the  intelligent  draper  will  always 
have  it  in  stock.  Again,  in  Table  (2),  the  disease  mode  or  fashion- 
able disease  among  certain  school  children  inspected  in  Surrey  in 
1913  was  measles,  for  a  greater  percentage  of  children  had  suffered 
from  measles  than  from  any  other  of  the  diseases  recorded. 

Now  when  the  variable  in  which  we  are  interested  is  '  discrete,' 
that  is,  when  it  changes  by  unit  steps,  leading  to  classes  like  '  tene- 
ments with  1  room,'  '  tenements  with  2  rooms,'  '  tenements  with 
3  rooms,'  and  so  on,  it  is  an  easy  matter  to  pick  out  the  class  of 
greatest  frequency  :  thus,  in  Table  (5)  there  are  more  overcrowded 
tenements  with  2  rooms  than  with  any  other  number  of  rooms 
in  the  urban  districts,  so  that  2  is  the  mode  so  far  as  this  character 
(number  of  rooms)  is  concerned,  whereas  in  the  rural  districts  3  is 
the  mode,  for  there  are  more  overcrowded  tenements  with  3  rooms 
than  with  any  other  number.  There  may  be  ambiguity,  however, 
in  determining  the  mode  in  this  way  for  a  grouped  frequency  dis- 
tribution when  we  are  dealing  with  an  organ  or  character  subject 
to  *  continuous  variation.'  To  cover  such  cases  the  modal  value 
has  been  defined  as  that  value  for  which  the  frequency  per  unit 
variation  of  the  organ  or  character  is  a  maximum.  The  precise 
significance  of  this  wording  will  only  be  appreciated  after  discussing 
frequency  curves  :  at  present  it  must  suffice  to  give  a  practical 
illustration  of  how  the  ambiguity  arises  and  calls  for  some  more 
refined  treatment. 

For  this  purpose  turn  again  to  the  examination  marks  in  Table  (7), 
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from  which  it  appears  that  the  mode,  if  it  is  to  be  the  marks  obtained 
by  the  greatest  number  of  candidates,  should  lie  in  the  group 
(31  to  35),  since  there  are  87  candidates  with  marks  between  these 
limits,  and  this  number  exceeds  that  in  any  other  group.  But 
how  are  we  to  decide  the  exact  point  in  the  interval  (31  to  35)  which 
is  to  correspond  to  the  mode  ?  Shall  it  be  33  ?  We  might  say 
'  yes  '  if  the  distribution  were  perfectly  symmetrical  on  either  side 
of  the  (31  to  35)  group,  but  if  we  examine  the  neighbouring  groups 
we  see  that  the  balance  leans  rather  more  heavily  to  the  (26  to  30) 
group  with  a  frequency  of  82  than  to  the  (36  to  40)  group  with  a 
frequency  of  79,  and  we  might  allow  for  this  by  interpolating  in 
some  way — ignoring,  of  course,  any  errors  which  may  occur  in  the 
frequencies  themselves  owing  to  the  observations  being  generally 
limited  in  number.  But  the  pull  in  the  direction  of  lower  marks 
becomes  still  more  pronounced  to  our  minds  when  we  contrast 
also  the  frequencies  in  the  next  groups  on  either  side,  namely 
58  and  50.  So  we  might  go  on  until  the  influence  of  the  whole 
field  of  observations  comes  into  action. 

Now  it  so  happened  that  in  this  particular  case  the  original 
marks-sheet  was  to  be  seen,  and  a  regrouping  of  the  candidates  as 
in  Table  (8)  makes  it  clear  that  the  value  found  in  this  way  for  the 
mode  may  be  artificially  displaced  sometimes  to  a  serious  extent 
by  the  particular  method  of  grouping  adopted.  Thus,  according 
to  this  new  arrangement,  the  mode  would  seem  to  lie  in  the  interval 
(28  to  32),  the  mid-value  of  which  differs  materially  from  33,  the 
mid- value  of  the  previous  maximum  frequency  group. 


Table  (8).  Marks  obtained  by  514  Candidates  in  a 
CERTAIN  Examination  (Alternative  Grouping). 


Marks  Obtained. 

No.  of 
Candidates. 

Marks  Obtained. 

38  to  42 
43  to  47 
48  to  52 
53  to  57 
58  to  62 
63  to  67 

No.  of 
Candidates. 



73 
45 
31 
12 

3 

3 

3  to  7 
8  to  12 
13  to  17 
18  to  22 
23  to  27 
28  to  32 
33  to  37 

1 

10 
17 
35 
56 

47 

108 

74 

1 

Total 

514 

28 
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[It  should  be  observed  that  while  an  alteration  of  the  grouping 
may  also  affect  the  median,  it  does  not  affect  it  nearly  to  the  same 
extent :  e.g.  the  median  determined  from  Table  (8)  is  31-3,  which 
differs  little  from  31-5  the  value  obtained  by  the  first  grouping.] 

If,  again,  we  combine  the  results  of  our  two  groupings  to  find 
the  mode  we  might  be  tempted  to  conclude  that  it  lies  somewhere 
between  the  limits  31  and  32,  but  on  examining  the  original  records 
it  was  discovered  that  the  real  mode  was  28.  The  frequency 
distribution  of  candidates  in  this  neighbourhood  was  in  fact  very 
interesting  ;   it  ran  as  follows  : — 

Number  of  candidates  who  obtained  25  marks  =14 

26      „     =10 


27 
28 
29 
30 


=  6 
=  33 
=  17 
=  16 


The  explanation  of  this  peculiar  distribution  seemed  to  be  that 
28  marks  were  required  for  a  candidate  to  pass,  and  apparently  as 
many  candidates  as  possible  were  pushed  over  the  pass  line  :  if, 
on  the  first  marking,  a  candidate  was  found  to  want  only  one  mark 
to  pass,  the  examiner  presumably  looked  through  his  paper  again 
and  did  his  best  to  find  an  answer  which  by  kindly  treatment 
might  be  granted  an  extra  mark.  The  effect  of  this  leniency  was 
ultimately  to  leave  only  6  candidates  in  the  division  immediately 
below  the  pass  line,  and  to  swell  the  number  immediately  above 
to  33,  which  thus  made  28  easily  the  '  most  fashionable  '  mark  of 
any,  the  next  largest  group  of  candidates  being  only  21.  It  will 
be  observed  that  even  a  candidate  who  wanted  2  marks  to  pass 
was  treated  in  the  same  tolerant  fashion,  although  it  is  not  so 
easy,  of  course,  for  a  conscientious  examiner  to  discover  two  extra 
marks  as  it  is  to  discover  one  ;  and  if  the  candidate  is  3  marks 
below  the  pass  line  it  is  still  harder  to  give  him  the  necessary  lift 
to  carry  him  over.  Thus  in  the  final  list  we  find  more  condidates 
with  26  marks  than  with  27,  and  still  more  with  25  than  with  26. 
If  the  above  diagnosis  is  correct,  and  aU  marks-sheets  tell  the  same 
tale,  who  shall  again  say  that  examiners  do  not  temper  justice  with 
mercy  ? 

This  example  has  illustrated  fairly  clearly  the  difficulty  of  fixing 
the  mode  with  any  great  precision  by  mere  inspection  when  the 
individuals  are  arranged  in  groups,  the  value  of  the  variable  under 
discussion  lying  between  prescribed  limits  for  each  group.     While 
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it  is  possible  to  get  a  rough  approximation  to  its  value  in  this  way, 
we  conclude  that  for  a  really  satisfactory  determination  we  require 
some  method  which  makes  use  of  the  whole  distribution,  as  in  the 
determination  of  the  mean,  and  not  merely  of  the  portion  in  the 
supposed  neighbourhood  of  the  mode.  This  must  be  left  to  a  later 
chapter  ;  we  shall  only  point  out  before  passing  on  that  there 
may  sometimes  be  more  than  one  mode  in  a  given  frequency  dis- 
tribution just  as  there  may  be  more  than  one  fashionable  type  of 
collar  which  it  is  expedient  for  the  draper  to  stock  in  large  quan- 
tities. The  second  grouping  in  the  examination  example  suggests 
such  a  possibility,  for  it  will  be  noticed  that  the  frequencies  of 
candidates  do  not  rise  steadily  to  a  single  maximum  at  108  for 
class  (28  to  32),  and  then  fall  steadily  :  there  is  a  previous  rise  and 
fall  in  the  neighbourhood  of  class  (18  to  22). 

Weighted  Mean.  Let  us  suppose  a  farmer  employs  for  the 
harvest  5  men,  3  women,  and  4  boys.  In  estimating  the  amount 
of  work  they  can  do  in  a  given  time  it  is  clear  that  in  general  a 
woman  or  boy  cannot  be  reckoned  as  equal  to  a  man.  He  must 
therefore  decide  what  '  weight '  must  be  given  to  each  in  proportion 
to  a  man.  If  a  woman's  work  be  taken,  for  example,  to  be  three- 
quarters  as  effective  and  a  boy's  work  to  be  half  as  effective  as 
that  of  a  man,  we  have  as  the  appropriate  proportional  weights 

1  :f  a,  or  4:3:2. 

Hence  5  men,  3  women,  and  4  boys  would  on  the  average  be  equiva- 
lent in  output  to 

(5+3xi+4xJ)  men 

4x5+3x3+2x4 
= men 


^  =91  men. 

An  average  of  this  type  is  called  a  weighted  mean,  1,  |,  and 
I  being  the  weights,  because  they  tell  us  what  weight  to  give  to 
each  separate  worker  in  calculating  the  average. 

Let  us  consider  the  effect  such  weighting  has  in  general  upon  a 
mean,  and  for  this  purpose  we  shall  test  it  on  a  set  of  index  numbers 
measuring  rents  in  certain  groups  of  towns  in  1912,  as  given  in  a 
Report  on  the  Cost  of  Living  of  the  Working  Classes  issued  by  the 
Board  of  Trade  (Cd.  6955). 
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Table   (9).    Mean  Index   Numbers   of   Rents   for  certain 

Geographical  Groups  of  Towns  in  1912  (with  reference 

TO  Middle  Zone  of  London  as  standard  =  100). 

(2)  (8) 


(1) 


(4) 


(6) 


(6) 


Geographical  Group. 

Rents. 

No.  of 
Towns 
included 
in  the 
Group. 

Each 

Group 

counting 

as  1. 

Arbitrary 
Weights. 

Approxi- 
mate sub- 
multiples 
of  Noa.  in 
previous 
column. 

Northern  Counties  and  Cleve- 
land        .... 
Yorkshire  (except  Cleveland) 
Lancashire  and  Cheshire 
Midlands     .... 
Eastern  and  East  Midland  Cos. 
Southern  Counties 
Wales  and  Monmouth  . 
Scotland      .... 
Ireland        .... 

660 
58-5 
56-9 
52-3 
53-4 
63-7 
64-8 
620 
51-7 

9 
10 
17 
14 

7 
10 

4 
10 

6 

1 

27 
54 
45 

125 
63 
14 
22 

178 
55 

3 

6 
5 

14 
7 
2 
2 

20 
6 

Average         .          . 

•• 

58-4 

58-8 

57-6 

57-6 

The  first  mean  in  the  above  table,  58-4,  is  obtained  by  multiply- 
ing (or  weighting)  the  mean  rent  of  each  geographical  group  by  the 
number  of  towns  in  the  group,  given  in  col.  (3),  adding  the  numbers 
so  obtained,  and  dividing  the  total  by  the  total  number  of  towns, 


thus  : — 


9(66-0)+ 10(58-5)+ 


+  6(51-7) 


9      +      10      + 


+       6 


This  is  simply  the  arithmetic  mean  treating  each  town  as  unit. 

The  second  mean,  58-8,  is  obtained  by  adding  the  mean  rents  of 
all  the  groups  and  dividing  by  the  total  number  of  groups,  thus  : — 


66^0+58-5+ 

"r~+^rT 


+51-7 


+    1 


This  is  the  arithmetic  mean  treating  each  geographical  group  as 
unit. 

The  third  mean,  57-6,  is  obtained  by  multipljdng,  or  weighting, 
the  mean  rent  of  each  group  by  a  perfectly  arbitrary  number  given 
in  col.  (5)  ;   the  numbers  selected  were  taken  quite  at  random  from 
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another  column  of  figures  in  another  Blue-book,  and  had  no  con- 
nection whatever  with  the  subject  of  rents  ;  this  gives  : — 

27(66-0)+54(58'5)+   .  .  .  +55(51-7) 
27      +    ~54   +   .  .  .  +        55~' 

The  last  mean,  57-6,  is  obtained  by  choosing  as  weights  any 
numbers  (and  for  simplicity  we  choose  the  smallest)  as  in  col.  (6) 
which  are  very  roughly  proportional  to  the  arbitrary  weights  used 
in  the  last  instance  ;  we  thus  get : — 

3(66-0)+6(58-5)+   .  .  .  +6(51-7) 

3     +       6    +  .  .  r+      6~  * 

Now  the  first  of  these  means  is  clearly  the  most  satisfactory,  since 
it  is  the  result  of  very  properly  weighting  the  mean  rent  of  each 
group  of  towns  according  to  the  number  of  towns  the  group  con- 
tains. But  the  second  result  shows  that  if  we  are  ignorant  of  the 
number  of  the  towns  in  each  group  we  shall  not  be  very  far  out  in 
our  calculation  if  we  treat  them  all  as  of  equal  importance,  and  find 
the  simple  arithmetic  mean  of  the  mean  rents  in  the  nine  groups. 
We  can  even  go  further,  for  we  find,  from  the  third  and  fourth  results, 
that  by  weighting  the  mean  rents  in  the  various  groups  on  quite  a 
random  basis,  the  mean  we  get  still  does  not  differ  very  greatly  from 
the  best  value  first  found. 

The  important  principle  of  which  the  above  example  is  an  illus- 
tration is  perfectly  general,  and  may  be  stated  as  follows :  If  the 
total  number  of  measurements  or  observations  be  not  very  small, 
and  if  the  resulting  values  of  the  organ  or  character  measured 
(rent  in  our  case)  be  not  very  unequal,  any  reasonable  selection  of 
multipliers  or  weights  (as,  for  instance,  the  first  two  adopted  above) 
will  give  means  which  differ  from  one  another  by  but  little  ;  and 
even  an  apparently  unreasonable  selection  of  multipHers  (as,  for 
instance,  the  third  adopted  above),  assuming  they  are  not  so 
wildly  chosen  as  to  give  any  particular  group  a  very  unfair  weight 
in  comparison  with  the  others,  will  not  throw  the  mean  out  badly. 
Further,  in  place  of  a  set  of  large  multipHers  we  may  substitute 
small  numbers  which  are  roughly  proportional  to  them  (as  we  have 
done  in  the  fourth  case  above),  and  the  mean  wiU  again  be  very 
little  affected.     [See  Appendix,  Note  2.] 
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AVERAGES  {continued) 

Applications  of  Weighted  Mean.  In  determining  the  weighted  mean 
of  a  set  of  observations  it  is  usual,  of  course,  to  weight  each  observa- 
tion according  to  its  importance,  though  what  number  should  be 
chosen  as  a  measure  of  its  importance  may  sometimes  be  a  matter 
of  doubt.  It  is  not  a  very  difficult  matter  to  decide  when  we 
wish,  for  example,  to  compare  birth,  marriage,  or  death  rates  in 
two  districts,  if  we  know  how  the  constitution  of  the  population 
in  the  one  district  differs  from  that  in  the  other,  for  the  weighting 
in  each  of  these  cases  must  be  in  proportion  to  the  population 
concerned,  and  it  is  too  important  to  ignore. 

Death  rate,  crude  and  corrected.  Imagine  a  city  in  which  the 
total  number  of  deaths  in  a  certain  year  is  N  out  of  a  population 
numbering  P. 

The  ordinary  or  crude  death  rate  for  that  city  will  then    be 

N 

-XlOOO,  by  deianition. 

Now  this  number  N  may  be  analysed  according  to  the  ages  of 
the  people  who  have  died  ;  let  us  suppose  it  is  made  up  of 

^1  people  between  limits  0  and  less  than  5  years  of  age, 
^2       n  ,,  ,,      5  ,,  15 

^3        '  ''  "     15  "  25 

and  so  on,  where 

^l  +  ^2  +  '^3+     •     •     •      =N- 

Again  the  number  P  may  be  analysed  according  to  the  ages  of 
the  people  who  compose  the  total  population,  giving,  say, 

p^  of  the  population  between  limits  0  and  less  than  5  years  of  age, 

15 
25 


P2         „ 

5>                                  55 

5 

Vz      »j               " 

>5                                  55 

15 

and  so  on,  where 

Pl^-V2-\-lh-\- 
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Thus  we  may  write  for  the  crude  death  rate 

N 
D=     xlOOO 
P 


:_JIL_2^    3^ X  1000 


=''-ilOOO+'^1000+'^1000+  .  .  . 

^^/^-MoooV^^^-^ioooV^f'^ioooV  .  .  . 

where  d-^\^  the  death  rate  between  limits  0  and  less  than  5  years  of  age, 
^2  "  "  .5     5   .        ,,  15  ,, 

d^  „  „  „    15  „  25 

and  so  on. 

Now  if  we  compare  this  expression  with  the  corresponding  one  for 
another  city,  say, 

it  is  quite  conceivable  that  the  death  rates  in  the  various  age  groups 
might  be  equal — 

di^=d ,  d^==d^,  d_=d^  .  .  . 

and  yet  D  might  exceed  D'  because  in  the  first  city  there  are  a 
greater  proportion  of  infants  or  old  people,  on  which  classes  the 
hand  of  death  falls  heaviest,  that  is,  because  the  ^'s  or  weights 
which  multiply  the  biggest  d's  are  greater  in  the  first  case  than  in 
the  second.  But  so  long  as  the  d's  in  the  two  cities  are  equal,  age 
group  by  age  group,  it  would  be  reasonable  to  regard  the  cities  as 
equally  healthy,  or  unhealthy  as  the  case  might  be,  and  therefore 
to  insure  a  fair  comparison  it  is  usual  in  the  Reports  of  the  Registrar- 
General  to  give  a  corrected  death  rate  in  place  of  the  crude  death 
rate  defined  above. 

This  is  done  by  weighting  the  death  rate  for  each  age  group,  not 
in  proportion  to  the  actual  number  of  persons  in  that  group  in 
the  city  itself,  but  in  proportion  to  the  corresponding  number  in 

C 
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the  country  at  large.     Thus,  if  we  denote  the  proportion  of  the^ 
population,  Q,  , 

between  limits  0  and  less  than  5  in  the  country  at  large  by  qJQ,    ! 

»  J5    15  ,,  25        „  ,,  ,,  QslHf    i 

and  so  on,  we  get  as  the  corrected  death  rate  i 

i 

toA+g'2<^2+9^3^3+  •  •  O/Q,  i 

a  form  wliich  has  the  effect  of  making  the  results  agree  in  two' 
cities  which  have  equal  d'8  throughout. 

A  similar  method  of  correction  is  clearly  applicable  in  consider-  • 
ing  the  incidence  of  the  death  rate  when  we  are  concerned  not  | 
with  a  difference  of  district  but  with  a  difference  of  sex,  occupation,  ; 
religious  profession,  wage-earning  capacity,  or  any  other  well-  j 
defined  character.  Further,  it  may  be  used  also  in  comparing  birth  ! 
rates,  marriage  rates,  heights,  weights,  chest  measurements,  or  any . 
similar  attributes,  when  it  is  necessary  to  refer  the  observations  i 
or  measurements  to  a  standard  population  in  order  to  avoid  \ 
complications  due  to  age  variation.  ' 

There  is  another  method  of  correction,  equally  general  in  appUca-  j 
tion,  which  is  useful  when  the  death  rates  in  the  various  age  groups  \ 
are  not  known.  In  this  case  D,  the  crude  death  rate  for  the  whole  i 
population  of  the  district  is  known,  also  pJ'P,  'P^l^,  Psl^,  •  •  •  the  \ 
proportions  of  the  population  between  the  various  age  limits,  but 
di,  ^25  ^3  .  •  •  are  supposed  unknown.  i 

Now  if  the  population  in  the  country  as  a  whole  were  the  same  in  ■ 
corresponding  age  groups  as  it  is  in  the  district  under  consideration,  j 
we  should  get  as  the  death  rate  for  the  whole  country  j 

where  S^,  Sg?  ^a  •     •  ^re  the  death  rates  in  the  various  age  groups  in 
the  country  at  large,  and  these  would  in  practice  as  a  rule  be  known. 
The  actual  death  rate  for  the  whole  country  is,  however,  ; 

{qA+q2^2+q2^z+  .  .  .  )/Q,  \ 

where  g'l/Q,  g'2/Q'  S's/Q  •  •  •  denote,  as  before,  the  real  proportions  ! 

of  the  population  in  the  various  age  groups  in  the  country  at  large.  1 

We  take  as  the  corrected  death  rate  required  for  the  district  a  ] 

number  bearing  to  the  crude  death  rate  the  same  ratio  as  ■ 

{qA+q2K+   •  .  O/Q  bears  to  {PiS,+p,S,+   .  .  .)/P.  \ 
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Hence  we  have 


corrected  death  rate_g'iSi -1-^282+  ...     P 

iTidex  Numbers  to  compare  Household  Budgets.  Another  highly 
important  illustration  of  a  weighted  mean  occurs  in  the  search  for  a 
satisfactory  measure  of  the  change  in  the  cost  of  Uving  from  year 
to  year.  We  have  already  introduced  the  subject  of  variation  in 
wholesale  prices,  and  we  have  seen  that  Sauerbeck,  in  forming  his 
index  numbers,  treats  as  one  each  of  the  forty-five  commodities 
he  uses  to  measure  this  variation  :  the  observations,  that  is  to 
say,  are  not  weighted. 

But,  confining  our  attention  to  food  alone,  supposing  we  have 
five  items,  such  as  bacon,  bread,  tea,  sugar,  milk,  for  which  the 
index  numbers  of  prices  at  two  different  dates  are  : — 


Bacon. 

Bread. 

Tea. 

Sugar. 

Milk. 

First  date 
Second  date 

100 
117 

100 
95 

100 

94 

100 
102 

100 
109 

Is  it  really  right  to  treat  each  of  these  items  as  of  equal  importance 
with  the  rest,  or  ought  we  to  regard  bread  and  tea,  say,  as  of  more 
weight  than  bacon,  and  count  bread  perhaps  five  times  and  tea 
three  times  while  counting  bacon  only  once  ?  It  is  clear  that,  in 
order  to  select  a  reasonable  set  of  multipUers  in  this  case,  we  should 
need  to  know  the  standard  of  living  of  the  class  of  people  under 
consideration,  and  how  much  in  the  aggregate  they  spend  upon 
bacon  and  how  much  upon  bread,  etc. 

A  partial  answer  to  these  questions  can  be  obtained  by  making 
a  collection  of  household  budgets  as  was  done,  for  example,  by  two 
Government  Committees  which  recently  reported  (1918-19)  on  the 
Cost  of  Living  among  the  Urban  and  the  Agricultural  Worki'ng  Glasses 
respectively.  If  the  number  of  commodities  employed  is  large, 
even  an  arbitrary  set  of  multipHers,  as  we  have  indicated,  will  not 
displace  the  mean  any  great  distance  from  the  value  when  reason- 
able weights  are  chosen,  but  unfortunately  in  collecting  such  house- 
hold budgets  we  are  confined  to  the  comparatively  limited  variety 
of  food-stuffs  which  are  in  general  use. 

Different  principles  may  be  followed  in  making  the  comparison 
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between  one  year  and  another  which  may  be  illustrated  by  a  few 
figures  from  the  Urban  Classes  Report  (1918) : — 

Table  (10).  Household  Budgets  showing  Prices  of  bach  Com- 
modity AND  Quantities  Purchased  at  Two  Different 
Dates  by  Typical  Family. 


Commodity. 

First  year  (1914). 

Second  year  (1918). 

Price  (pence 
per  lb). 

ni 
No.  of  lb. 

bought. 

a;2 

Price  (pence 

per  lb.) 

n2 
No.  of  lb. 
bought. 

Sugar. 
Tea     . 
Potatoes 

2-2 

21-3 

0-7 

5-9 
0-68 
15-6 

7-07 
33-3 
1-26 

2-83 
0-57 
200 

Let  Xi  be  the  price,  in  pence  per  unit,  of  any  one  commodity 
at  the  first  date,  and  let  n^  be  the  number  of  units  of  this  commodity 
bought  per  week  by  a  typical  family  {n  may  be  estimated  in  different 
ways,  e.g.  (1)  by  dividing  the  total  number  of  units  bought  by 
all  famihes  by  the  total  number  of  those  famiHes,  or  (2)  by  ranging 
the  different  amounts  bought  by  different  families  in  order  of 
magnitude  and  picking  out  the  median  amount,  or  (3)  by  choosing 
the  mode,  i.e.  the  amount  most  commonly  purchased).  Also  let  ajg 
be  the  price,  in  pence  per  unit,  of  the  same  commodity  at  the  second 
date,  and  let  Wg  ^^  ^^®  number  of  units  of  the  commodity  then 
bought  per  week  by  the  typical  family  estimated  in  the  same  way 
as  before. 

The  actual  expenditure,  measured  in  pence,  at  the  two  dates 
will  then  be 

Z{Xini)  and  ^(x^n^ 

respectively,  where  E(x-^n-^  simply  denotes  the  sum  of  expressions 
like  (x-jji-^)  for  all  the  commodities  recorded  and  ^{x^n^)  denotes  the 
sum  of  expressions  like  (x^n^  for  aU  the  commodities  recorded, 
Sy  the  old  English  S,  being  a  well-known  conventional  abbreviation 
for  '  Sum  of  expressions  like.'  Thus,  with  the  numbers  in  Table  (10), 
we  should  have 

2'(a;i7ii)=(2-2)(5'9)+(21-3)(0-68)+(0-7)(15-6)+   .  .  . 
^(^2^2)=('7-07)(2-83)-f(33-3)(0-57)+(l-26)(20-0)+  .  .  . 
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Taking  100  as  the  index  number  to  represent  expenditure  at  the 
first  date,  the  index  number  measuring  expenditure  at  the  second 
date  may  be  formed  in  any  of  the  following  different  ways,*  which 
as  a  rule,  of  course,  lead  to  different  results  : — 

(1)  lOOZ{x^n^)IU{x^n^) ; 

(2)  lOOi:(x^nj)IU{x^ni)  or  lOOZ (x^n^jjUix^n^) ; 

(3)  l002J{Xjn2)IZ{Xjnj)  or  1002'(a:2?i2)/^(^2%)- 

The  first  of  these  expressions  compares  the  acttuil  expenditure  at 
the  second  date  to  that  at  the  first  date. 

The  next  two  expressions  take  into  account  directly  only  the 
change  in  prices  ;  they  compare,  not  actual  expenditures  but,  the 
expenditures  at  the  two  dates  as  they  would  be  if  the  amounts 
purchased  at  the  two  dates  were  the  same  :  the  first  supposiug 
these  amounts  to  equal  those  actually  bought  at  the  first  date, 
and  the  second  supposing  them  to  equal  those  actually  bought 
at  the  second  date. 

The  last  two  expressions,  on  the  other  hand,  take  into  account 
directly  only  the  change  in  amounts  purchased ;  they  compare 
the  expenditures  at  the  two  dates  as  they  would  be  if  the  prices 
ruling  at  the  two  dates  were  the  same  :  the  first  supposing  these 
prices  to  equal  those  actually  charged  at  the  first  date,  and  the 
second  supposing  them  to  equal  those  actually  charged  at  the 
second  date. 

The  particular  method  of  weighting  adopted  must  naturally 
depend  upon  the  circumstances  of  the  period  under  discussion 
and  the  nature  of  the  inquiry  one  is  making ;  it  is  a  nice  question 
to  decide  how  far  emphasis  should  be  laid  upon  the  old  standard 
of  life  (measured  by  food,  lighting,  rent,  recreation,  etc.)  with  the 
expense  required  to  maintain  it,  and  upon  the  new  standard  of  life 
and  the  cost  necessary  to  reach  it. 

It  may  be  useful  here  to  summarize  a  few  of  the  questions  of 
interest  which  present  themselves  in  connection  with  the  formation 
of  index  numbers  of  prices  designed  to  measure  changes  in  the 
value  of  money  in  general  without  reference  to  any  particular  class 
of  the  community  : — 

1.  What  years  should  be  selected  in  fixing  our  standard  prices  ? 

2.  What  commodities  should  be  chosen  as  a  basis  for  our 
average  ? 

[*  See  also  The  Measurement  of  Changes  in  the  Cost  of  Living,  by  A.  L.  Bowley,  Sc.D., 
in  the  Jov/rnal  of  the  Royal  Statistical  Society,  May  1919,  for  a  more  complete  dis- 
cussion of  the  subject.] 
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3.  What  weight  should  be  given  to  each  commodity  in  relation 
to  the  rest  ? 

4.  How  should  the  prices  of  the  several  commodities  be  deter- 
mined, bearing  in  mind  that  '  price  '  itself  frequently  varies  from 
place  to  place  ? 

5.  Finally,  how  should  these  prices  be  combined  to  give  the 
average  required  ?  Should  we  use  the  simple  arithmetic  mean,  the 
geometric  mean  [see  Appendix,  Note  3],  the  median,  or  some  other 
measure  ? 

While  we  are  not  prepared  to  attempt  to  answer  these  questions 
fully,  seeing  that  authorities  are  not  altogether  agreed  as  to  what 
the  answers  should  be,  one  or  two  points  may  be  worth  noting. 
Generally  speaking  we  may  say  that : — 

1.  The  years  selected  in  fixing  our  standard  prices  should  be 
years  in  which  economic  conditions  were  normal  rather  than 
abnormal. 

2.  The  commodities  chosen .  should  be  articles  of  general  con- 
sumption, and  as  wide  a  field  as  possible  should  be  covered  in  their 
choice. 

3.  Many  consider  that  little  is  gained  by  weighting,  but,  if 
weights  are  introduced,  the  greater  the  importance  of  any  com- 
modity in  relation  to  the  rest,  judged  for  example  by  the  relative 
quantity  consumed,  the  greater  should  be  the  weight  assigned 
to  it. 

4.  The  practical  difficulty  of  assessing  retail  prices  when  they 
are  uncontrolled  compels  us  in  general  to  fall  back  upon  whole- 
sale quotations,  on  which  some  light  may  be  thrown  by  keeping 
under  observation  the  important  markets  for  the  sale  of  each 
commodity. 

5.  The  average  commonly  used  is  the  simple  arithmetic  or  the 
weighted  mean,  though  arguments  can  be  adduced  in  favour  of 
other  averages  such  as  the  median. 

Leaving  index  numbers  now  on  one  side  and  returning  to  the 
general  subject  of  averages,  we  may  remark  that  the  question 
which  average  is  correct  in  any  given  case,  the  mean  (weighted  or 
otherwise),  the  median,  or  the  mode,  does  not  arise  :  no  one  average 
is  more  correct  than  another,  because  they  are  all  entirely  con- 
ventional and  represent  different  ideas  ;  they  correspond  in  fact 
to  so  many  different  ways  of  summing  up  a  set  of  observations  or 
measurements  in  a  single  numerical  statement,  and  the  real  question 
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to  determine  is  which  statement,  which^  kind  of  average,  brings  the 
set  of  observations  before  us  to  the  best  focus. 

For  this  purpose  one  average  will  clearly  be  best  in  one  case  and 
another  in  another,  but  it  may  be  stated  without  hesitation  that 
the  arithmetic  mean  is  certainly  the  most  useful  of  the  three  and 
it  is  the  most  frequently  used.  Other  averages  have  been  sug- 
gested, such  as  the  geometric  and  the  harmonic  means  [see  Appendix, 
Note  3]  familiar  to  students  of  Algebra,  but  they  are  only  suitable 
in  a  comparatively  small  class  of  problems. 

In  a  reasonably  symmetrical  distribution  of  observations,  one  in 
which  the  variables  of  medium  size  are  the  most  frequent  and  the 
frequency  diminishes  about  equally  on  either  side  towards  the 
largest  and  the  least  of  the  variables,  the  values  of  the  mean,  the 
median,  and  the  mode  will  be  found  to  lie  all  very  close  together  ; 
and  a  useful  practical  rule  to  remember  is  that  the  median  comes 
in  general  between  the  mean  and  the  mode,  the  difference  between  the 
mean  and  the  mode  being  about  three  times  the  difference  between  the 
mean  and  the  median.  This  rule,  for  lack  of  a  better,  might  be  used 
to  determine  the  mode  in  suitable  cases,  or  it  might  be  used  to  test 
the  value  found  in  some  other  way. 

The  general  term  '  average  '  is  frequently  used  when  the  par- 
ticular denomination  '  arithmetic  mean  '  is  implied,  but  the  context 
will  usually  prevent  misunderstanding. 

In  order  to  get  a  clear  impression  of  the  outstanding  features 
presented  by  the  three  chief  averages  discussed,  let  us  go  over  them 
once  more  in  the  case  of  marks  awarded  to  a  number  of  students 
in  a  class.  All  three  may  be  regarded  as  in  a  sense  measures  of 
the  standard  reached  by  the  class  as  a  whole  in  the  examination, 
but  the  measures  are  made  in  different  ways  : — 

1.  The  Arithmetic  Mean  is  found  by  merely  dividing  the  aggregate 
marks  of  the  class  by  the  number  of  the  students,  and  it  gives  the 
marks  earned  by  each  student  if  we  conceive  them  all  to  be  of 
equal  merit. 

2.  The  Median  is  found  by  rangmg  the  students  in  order  of  merit 
from  top  to  bottom,  and  picking  out  the  marks  awarded  to  the  one 
who  comes  half-way  down  the  list. 

3.  The  Mode  is  the  most  fashionable  number  of  marks,  i.e.  the 
marks  obtained  by  the  greatest  number  of  candidates. 

The  advantages  and  disadvantages  of  the  three  types  may  be 
set  out  broadly  as  follows,  although  the  boundary  lines  must  not 
be  too  strictly  drawn  : — 
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Mean. 

Median. 

Mode. 

Easy  to  calculate  when 
the  values  of  the  vari- 
able can  be  summed 
and  their  number  is 
known. 

Easy  to  pick  out  when 
the  individuals  can 
be  ranged  in   order 
according     to      the 
value  or   degree   of 
the      variable      ob- 
served. 

Not  easy  to  determine 
with  precision,  when 
the  observations  fall 
into  groups  of  differ- 
ent ranges,  without 
fitting  a  frequency 
curve  to  the  distribu- 
tion as  a  whole. 

Well  designed  for  alge- 
braical manipulation, 
as,  for  example,  when 
we  wish  to  combine 
different  sets  of  obser- 
vations [see  Appendix, 
Note  4,  for  two  illus- 
trations]. 

Unsuited  for  algebrai- 
cal work. 

Unsuited  for  algebrai- 
cal work. 

Affected  sometimes  too 
much  by  abnormal  in- 
dividuals among  the 
observations. 

Determined  merely  by 
its   position   in   the 
distribution,  and  its 
actual  value  is  thus 
quite  unaffected  by 
abnormal  individuals. 

Unaffected  by  abnor- 
mal individuals,  and 
owes  its  importance 
to  the  fact  that  it  is 
located  in  the  region 
where  the  frequency 
is  most  dense. 

The  reader  should  test  his  grasp  of  the  principles  so  far  intro- 
duced by  applying  them  himself  to  a  concrete  case.  For  example, 
he  might  use  the  data  in  Table  (11),  with  regard  to  wages  earned 
by  certain  women,  taken  from  Tawney's  Minimum  Wages  in  the 
Tailoring  Trade,  and  based  upon  the  1906  Wages  Census.  Let  him 
begin  by  roughly  estimating  the  mean,  the  median,  and  the  mode 
from  an  inspection  of  the  distribution.  He  might  then  proceed 
to  calculate  the  mean  wage  : — 

(1)  taking  the  actual  frequencies  given  in  the  table  ; 

(2)  taking  simple  sub-multiples  of  these  frequencies,  roughly  one- 

hundredth  part  of  each  :   2,  4,  6,  7,  9,  11,  etc.  ; 

(3)  assuming  unit  frequency  in  place  of  that  given  in  the  table  for 

each  wage  group. 

Finally,  he  might  determine  the  median  and  the  mode  in  the 
manner  explained  in  the  text,  deducing  the  latter  from  the  relation 
(mean— mode) = 3(mean— median) . 
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(1)  13-08S.  ;   (2)  13-lQs.  ;   (3)  15-59s.  ^ 
Median=12-53sKf  Mode=ll'43s. 


Table  (11).   Distribution  of  Wages  of  certain 
Women  Tailors. 


(1 

(2) 

(3) 

(4) 

No.  of  Women 

No.  of  Women 

Wages  betAveen  limits 

earning  wages 
as  shown  in 

Wages  between  limits 

earning  wages 
as  shown  in 

Column  (1). 

Column  (3). 

5s.  and  less  than    6s. 

\m^ 

16s.  and  less  than  17s. 

642^^ 

68.         , 

7s. 

384- 

17s.      „         „        18s. 

453 

7s.       , 

8s. 

553^ 

18s.      „         „        19s. 

401 

8s.      , 

9s. 

690 

19s.      „         „       20s. 

272^^ 

9s.       , 

10s. 

900  j:^ 

20s.      „         „       21s. 

251 

10s.       , 

Us. 

1145 

21s.      „         „       22s. 

138 

lis.       , 

,        12s. 

1201 

22s.      „         „       23s. 

124 

12s.       , 

13s. 

1138 

23s.      „         „       24s. 

64 

13s.       , 

14s. 

930 

24s.      ,,         .,       25s. 

5r~^ 

14s.       , 

15s. 

885 

25s.      „         „       30s. 

122 

15s.       , 

16s. 

790  - 

.. 

•• 

CHAPTER   VI 

DISPERSION    OR   VARIABILITY 

Let  us  suppose  that  two  men  set  out  separately  on  walking  tours 
and  that  they  walk  as  follows  : — 


First  Man 
walks 

Second  Man 
walks 

First  day  . 
Second  „  . 
Third     „  . 
Fourth  „  . 
Fifth      „  . 
Sixth     „  . 

20  miles. 
20      „ 
25       „ 
25       „ 
30      „ 
30      „ 

15  miles. 
20      „ 
25       „ 
25       „ 
30      „ 
35'     „ 

6  days 

150  miles. 

150  miles. 

The  total  distance  covered  in  six  days,  namely  150  miles,  and 
therefore  also  the  mean  rate  of  walking,  25  miles  a  day,  are  thus 
exactly  the  same  in  both  cases,  but  the  dispersion  of  the  values  of 
the  variable  (the  variable  being  in  this  instance  the  number  of 
miles  walked  per  day)  round  about  their  mean  value,  the  variability, 
is  different  in  the  two  cases.  The  greatest  deviation  from  the 
average  in  the  first  case  is  five  and  in  the  second  case  it  is  ten  miles. 

Thus,  besides  knowing  the  average  of  a  set  of  values  of  a  variable 
it  is  important  to  measure  the  dispersion  of  the  distribution.  Are 
the  observations  crowded  in  a  dense  mass  around  the  average, 
or  do  they  tail  off  above  and  below  it,  and  to  what  extent  ? 
In  other  words,  what  is  the  variability  from  the  average  of  the 
distribution  ? 

Mean  Deviation.  Now  we  are  not  concerned  here  with  the  signs 
of  the  separate  deviations,  with  the  question,  that  is,  whether  any 
particular  value  of  the  variable  lies  above  or  below  the  average  : 
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it  is  only  of  their  amount  we  wish  to  take  cognizance,  and  perhaps 
the  most  obvious  way  to  measure  the  total  variability  and  at  the 
same  time  to  ignore  the  signs  of  the  separate  deviations  from  the 
average  is  to  add  up  these  deviations,  treating  them  all  as  signless, 
and  to  divide  the  result  by  their  total  number.  This  gives  what 
is  known  as  the  mean  deviation  of  the  system  of  observations — it 
is  the  ordinary  arithmetic  mean  of  the  separate  deviations,  treated 
as  if  they  are  aU  in  the  same  direction,  and,  in  measuring  them,  we 
may  use  either  the  mean  or  the  median  as  the  average,  but  it 
would  seem  preferable  to  take  the  latter  because  the  mean  deviation 
is  least  when  the  median  is  chosen  as  the  origin,  or  zero  point,  from 
which  the  differences  are  measured.  The  proof  of  this  fact  will 
be  found  in  Note  6  in  the  Appendix,  but  we  may  readily  test  it  in 
a  given  case. 

Let  us  adapt  the  '  walking '  illustration  used  above,  sUghtly 
extending  the  figures  and  making  them  unsymmetrical,  i.e.  of 
unequal  variability  on  either  side  of  the  average,  so  as  to  prevent 
the  median  coinciding  with  the  mean.  We  then  have  an  amended 
table  setting  out  the  number  of  miles  walked  by  a  certain  man  on 
successive  days  during,  say,  a  fortnight's  tour,  as  follows  : — 


Table  (12).  Number  of  Miles  walked  on  Successive  Days. 

(1)  (2)  (3)  (4)  (5)  (6)  (7)  (8) 


No.  of 
days. 

Miles 
walked. 

X 

Deviation 
from  25. 

^1 

Deviation 

from 

24-64. 

Xo 

Deviation 
from  24. 

Xi 

Deviation 
from  2G. 

[No.  in 

Col.  (l)]x 

[No.  in 

Col.  m 

[No.  in 

Col.  (l)]x 

[No.  in 

Col.  (4)]. 

1 
2 
3 
3 
2 
2 
1 

10 
15 
20 
25 
30 
35 
40 

■ 

15 

10 

5 

5 
10 
15 

14-64 
9-64 
4-64 
0-36 
5-36 
10-36 
15-36 

14 
9 
4 
1 
6 
11 
16 

16 

11 

6 

1 

4 

9 

14 

15 
20 
15 

10 
20 
15 

14-64 
19-28 
13-92 
1-08 
10-72 
20-72 
15-36 

14 

•• 

•• 

•• 

•• 

••, 

95 

95-72 

The  first  two  columns  show  that  10  miles  was  the  distance  walked 
on  the  first  day,  15  miles  on  each  of  the  next  two  days,  20  miles 
on  each  of  the  next  three  days,  and  so  on  until  the  last  day,  when 
40  miles  was  the  distance  walked. 
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The  median  in  this  case,  being  the  number  of  miles  walked  on 
the  middle  day  when  the  days  are  ranged  in  order  of  mileage  from 
the  least  to  the  greatest,  is  25,  for  this  is  the  distance  covered  on 
both  the  seventh  and  the  eighth  days  which  come  half-way  along 
the  series. 

Col.  (3)  shows  the  deviations  from  the  median,  25,  of  the  distances 
covered  each  day  as  recorded  in  col.  (2),  and  col.  (7)  enables  us  to 
sum  these  deviations  when  each  is  multiplied  by  the  number  of 
days  to  which  it  corresponds,  since  these  numbers,  given  in  col.  (1), 
show  how  many  times  each  deviation  is  repeated.  Hence  the  mean 
deviation,  regardless  of  sign,  measured  from  the  median 

=:[(lxl5)+(2xl0)+(3x5)-f(2x5)+(2xl0)+(lxl5)]/14 
=  (15+20+15+10+20+ 15)/14 
=95/14 
=6-79  miles. 

We  may  compare  this  with  the  corresponding  deviations  measured 
from  (1)  the  arithmetic  mean,  (2)  the  number  24,  and  (3)  the 
number  26  as  origin  respectively. 

1.  The  arithmetic  mean  of  the  distribution  is  obtained  at  once 
by  multiplying  the  corresponding  numbers  in  cols.  (1)  and  (2), 
adding  the  results,  and  dividing  the  total  by  14,  thus 

1  +  2  +  3+3  +  2  +  2+1 

10+30+60+75+60+70+40 


14 
=345/14 

=24-64  miles, 

and  the  deviations  from  24-64  are  shown  in  col.  (4)  ;  the  mean 
deviation  from  24-64,  obtained  by  combining  cols.  (1)  and  (4)  and 
adding  as  shown  in  col.  (8) 

=  [l(14-64)+2(9-64)+  .  .  .  ]/14 

=95-72/14 

=6-84  miles. 

2.  Similarly,  the  mean  deviation  from  24,  making  use  of  col.  (5). 

=  [l(14)+2(9)+  .  .  .  ]/14 
=  6-93  miles. 
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3.  And  the  mean  deviation  from  26,  making  use  of  col.  (6), 

=[1(16)+2(11)+   .  .  .  ]/U 
=7-07  miles. 

The  original  determination  gives  a  value  which  is  less  than  any 
of  these  three  results,  as  was  anticipated. 

The  mean  deviation  from  the  median  is,  however,  difficult  to 
calculate  with  exactness  when  the  observations  are  recorded  in 
groups  between  different  limits  :  for  this  and  other  reasons  we 
shall  not  spend  much  time  upon  it,  and  we  shall  as  a  rule  choose 
the  mean  as  origin  of  reference  rather  than  the  median.  It 
may  be  as  well  to  explain  the  source  of  the  difficulty  by  a  small 
hypothetical  illustration. 

Let  us  suppose  that  in  making  measurements  of  some  organ  or 
character  in  13  individuals  we  get  a  result  lying  between  4  and  6 
units  on  six  occasions,  between  6  and  8  units  on  four  occasions,  and 
between  8  and  10  units  on  three  occasions.  Here,  assuming  that  all 
the  individuals  in  any  group  have  the  mid- value  measurement  for 
that  group,  i.e.  treating  the  distribution  as  one  of  6  individuals 
with  a  variable  measuring  5  units,  4  individuals  with  a  variable 
measuring  7  units,  and  3  individuals  with  a  variable  measuring 
9  units,  we  get  §  as  the  mean  deviation  with  7  as  origin  and  ^^ 
for  the  mean  deviation  with  6-5  as  origin,  as  the  following  table 
shows  : — 


Measurement. 

Frequency. 

X 

Deviation 
from  7. 

y 

Deviation 
from  6-5. 

fx 

fy 

4  and  less  than    6 
6        „        „        8 
8        „        „       10 

6 
4 
3 

2 
0 
2 

1-5 
0-5 
2-5 

12 

6 

9 
2 
7-5 

13 

•• 

•• 

18 

18-6 

Now  the  result  obtained  is  in  agreement  with  the  minimum 
mean  deviation  theory,  granted  that  7  is  the  median  measurement, 
as  it  might  certainly  be.  But  it  is  not  so  of  necessity,  and  in  that 
case  the  assumption  italicized  might  lead,  in  the  above  calculation, 
to  appreciable  inaccuracy  unless  the  number  of  observations  is 
large  and  the  class-interval  is  small.     For  example,   the  actual 
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distribution  might,  without  contradicting  the  previous  data,  con- 
ceivably run  : — 


Measurement. 

Frequency. 

x' 

Deviation 

from  7. 

Deviation 
from  6-5. 

fx' 

fy'. 

6 

6-5 
7-5 
9 

6 

2 
2 
3 

2 

0-5 

0-5 

2 

1-5 

i 

2-5 

12 
1 

1 
6 

9 

2 
7-5 

•• 

13 

•• 

•• 

20 

18-5 

But  in  this  case  the  median,  the  measurement  for  the  seventh  indi- 
vidual from  either  end  of  the  series,  is  6-5,  and  according  to  the 
first  calculation  the  mean  deviation  referred  to  6*5  as  origin  appears 
to  be  greater  than  that  referred  to  7  as  origin.  If,  however,  we 
recalculate,  using  the  more  detailed  table,  we  find  that  the  mean 
deviation  referred  to  6*5  as  origin  (^)  is  really  less  than  the  mean 
deviation  with  reference  to  7  as  origin,  as  it  should  be,  for  the 
latter  now  turns  out  to  be  j^. 

Standard  Deviation.  An  alternative  method  of  avoiding  the 
signs  of  the'  deviations  from  the  average  in  order  to  estimate  the 
amount  of  variability  of  the  distribution  is  to  square  each  separate 
deviation,  sum  the  squares,  divide  by  their  number,  and  take  the 
square  root  of  the  result.  This  gives  the  root-mean-sqmire  deviation, 
and  it  is  least  when  the  arithmetic  mean  of  the  variables  is  chosen 
as  origin  from  which  to  measure  the  deviations,  when  it  is  known 
as  the  standard  deviation.  For  proof  of  this  minimum  principle 
see  Appendix,  Note  5,  but  it  is  worth  while  testing  it  also  with  the 
data  given  in  Table  (12). 

The  numbers  in  cols.  (3)  to  (6)  in  Table  (13)  are  obtained  simply 
by  squaring  the  corresponding  numbers  in  the  same  cols.  (3)  to  (6) 
in  Table  (12).  Col.  (7)  is  formed  in  order  to  enable  us  to  calculate 
the  mean-square  deviation  referred  to  25  as  origin  ;  the  numbers 
in  col.  (3)  show  the  squares  of  the  deviations  for  each  individual 
observation,  and  the  numbers  in  col.  (1),  by  which  they  are  multi- 
plied, show  how  frequently  the  same  values  are  repeated.  Hence 
we  get  the  mean- square  deviation  with  reference  to  25 

-^[l(225)  +  2(100)+3(25)+2(25)+2(100)+l(225)]/14 

=975/14 

=  69-64. 
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Thus  the  root-mean-square  deviation  referred  to  25 

=8-345.  ^ 

Similarly,  by  means  of  col.  (8),  formed  on  exactly  the  same 
principle,  we  find  that  the  root-mean-square  deviation  referred  to 
24-64  as  origin 

=  V[(214-33+ 185-86+  .  ,  .  )/14] 
=  V(973-22/14) 
=  8-338. 
But  24-64  is  the  mean  of  the  distribution,  hence  8-338  is  the  standard 
deviation. 

With  the  help  of  cols.  (5)  and  (6)  the  student  may  himself  calcu- 
late the  root-mean-square  deviation  with  regard  to  24  and  26 
respectively  as  origin  ;  the  results  should  be  8-36  and  8-45.  Of 
the  four  values  thus  obtained  for  the  root-mean-square  deviation, 
the  least  is  that  referred  to  the  mean  as  origin,  the  standard  devia- 
tion, now  proposed  as  a  measure  of  variability  or  dispersion  suitable 
for  most  general  purposes. 

This  measure  possesses  several  decided  advantages  over  the 
mean  deviation  ;  among  others  it  lends  itself  more  easily  to  certain 
algebraical  processes  (see,  for  example,  p.  158),  a  fact  of  importance 
when  we  wish,  for  instance,  to  discuss  two  sets  of  observations  in 
combination,  and  it  is  in  general  less  affected  by  '  fluctuations  of 
sampling  ' — errors  which  arise  owing  to  the  fact  that  we  cannot  as 
a  rulev  survey  the  whole  field  of  operations,  but  have  to  be  content 
with  a  sample. 

Table  (13).  Number  of  Miles  walked  on  Successive  Days. 

(1)  (2)  (3)  (4)  (6)  (6)  (7)  (8) 


No. 

of 

days. 

Miles 
walked. 

a;2 
Square  of 
Deviation 
from  25. 

x,^ 
Square  of 
Deviation 
from  24 -64 

x,^ 
Sqviare  of 
Deviatfion 
from  24. 

xs' 
Square  of 
Deviation 
from  26. 

/X2 

[No.  in  Col  (1)] 
[No.inCol.(3)J 

fxi" 
[No.  in  Col.  (1)] 

X 

[No.  in  Col.  (4)] 

1 

10 

225 

214-33 

196 

256 

225 

214-33 

2 

16 

100 

92-93 

81 

121 

200 

185-86 

3 

20 

25 

21-53 

16 

36 

75 

64-59 

3 

25 

. . 

0-13 

1 

1 

0-39 

2 

30 

25 

28-73 

36 

16 

50 

57-46 

2 

35 

100 

107-33 

121 

81 

200 

214-66 

1 

40 

225 

235-93 

256 

196 

225 

235-93 

14 

•• 

•- 

•• 

-• 

975 

973-22 
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Quartile  Deviation  or  Semi-interauartile  Range.  There  is  a  third 
measure  of  dispersion,  based  upon  the  determination  of  the  quartiles, 
and  to  introduce  them  we  may  refer  again  to  Table  (7)  in  order  to 
show  how  the  idea  of  the  median  may  be  extended. 

We  define  the  individual  occupying  a  position  one- quarter  the 
way  along  any  series  of  observations,  arranged  in  ascending  order 
of  magnitude  of  some  organ  or  character  common  to  all  the  indi- 
viduals of  the  series,  as  the  lower  quartile  ;  and  we  define  the  indi- 
vidual occupying  a  position  three-quarters  the  way  along  the  series 
as  the  upper  quartile. 

When  the  distribution  of  observations  is  divided  up  into  groups 

lying  between  different  Umits  of  the  variable  under  consideration 

the  quartiles  may,  like  the  median,  be  calculated  by  interpolation. 

^   Thus,  in  the  examination  example,  the  total  number  of  candidates 

^    is  514  and  J(514)- 128-5. 

^  But  the  9 1st  candidate  from  the  bottom  gets  approximately  20 

marks,  and  the  149th  candidate  from  the  bottom  gets  approxi- 
mately 25  marks.  Hence  the  imaginary  candidate.  No.  128-5, 
should  get  a  number  of  marks  lying  somewhere  between  20  and 
25.     But  if,  in  this  neighbourhood,  a  difference  of 

(149-91)  candidates  corresponds  to  a  difference  of  5  marks, 

37-5 

(128-5-91)       ,,     should  correspond  ,,  5x marks. 

Do 

Thus,  the  marks  assigned  to  the  lower  quartile  candidate  are 
approximately 

58 
^«  =20+3-23. 

Hence  the  lower  quartile=2S-23. 

^'       Again  |(514)=:385-5. 

But  the  318th  candidate  from  the  bottom  gets  approximately  35 
marks,  and  the  397th  candidate  from  the  bottom  gets  approxi- 
mately 40  marks.  Therefore,  the  imaginary  candidate,  No.  385-5, 
should  get  approximately  a  number  of  marks 

=35+5x^ 
79 

=39-27. 
Hence  the  upper  quartile=^^'21 . 
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It  is  clear  that  the  quartiles  together  with  the  median  divide  the 
whole  series  of  observations  into  approximately  f  om*  equal  groups,  so 
that  the  quartile  marks 
give  a  rough  idea  of  the  23'23  31 '52  39'27 

distribution    on    either  q  ^^^^ q, 

side  of  the  average.    For 

this  reason  half  the  difference  between  the  quartiles  provides  a 
convenient  measure  of  the  dispersion,  and  it  is  called  the  quartile 
deviation  or  semi-interquartile  range ;  thus,  if  Q  bfe  the  lower  and 
Q'  the  upper  quartUe,  we  have 

the  quartile  deviation=^{Q'—Q). 

In  the  above  example,  this  measure 

=4(39-27-23-23) 
=4(16-04) 

=  8-02. 

If  a  more  minute  analysis  of  the  distribution  of  variables  is 
desired,  we  may  range  them  in  order  of  magnitude  as  before,  and 
divide  up  the  series  into  ten  equal  parts,  recording  every  tenth  along 
the  line  ;  these  tenths  are  called  deciles. 

Thus,  the  deciles  in  the  examination  example  correspond  to  the 
marks  assigned  to  imaginary  candidates  numbered  as  follows  : — 

51-4,  102-8,  154-2,  205-6,  2570,  308-4,  359-8,  411-2,  462-6, 
and  they  can  be  calculated  by  the  interpolation  method  used  in 
finding  the  median  and  quartiles. 

This  way  of  representing  the  chief  features  of  a  distribution,  by 
quartiles,  etc.,  was  much  used  by  Galton  in  his  researches  and 
writings. 

The  student  may  be  perplexed  as  to  which  should  be  used  of  so 
many  different  measures  of  dispersion  or  variability,  but  there 
need  be  no  real  confusion.  If  a  rough  estimate  only  is  wanted  the 
quartile  deviation  is  a  convenient  measure,  assuming  that  the 
variables  observed  or  measured  can  be  ranged  in  order  of  magnitude 
so  as  to  admit  of  the  quartiles  being  readily  picked  out.  Also  the 
measure  thus  obtained  is  not  unsatisfactory  when  the  distribution 
of  values  of  the  variable  is  fairly  symmetrical  and  uniform  in  its 
gradation  from  greatest  frequency  to  least.  If,  however,  it  is 
conspicuously  skew  (unsymmetrical)  and  there  are  erratic  differ- 
ences in  frequency  between  successive  values  of  the  variable,  it 
is  better  to  choose  a  measure  which  gives  the  magnitude  and 
the  position  of  each  recorded  observation  its  due  weight  in  the 
deviation  sum. 

D 
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Then  again  the  choice  as  between  the  standard  deviation  and  the 
mean  deviation  may  be  sometimes  determined  by  the  particular 
kind  of  average  which  suits  the  problem  best.  But  as  the  arith- 
metic mean  is  the  most  important  and  the  most  commonly  used 
average,  so  the  standard  deviation  is  certainly  the  most  important 
measure  of  dispersion. 

It  will  be  shown  later  that  the  following  relations  are  approxi- 
mately true  when  the  distribution  of  variables  is  not  very  far  from 
being  symmetrical : — 

(1)  Quartile  deviation=  ^(Standard  deviation). 

(2)  Mean  deviation     =i{Standard  deviation). 

In  (2)  the  mean  deviation  should  be  measured  from  the  mean. 

Also  (3)  a  range  of  two  or  three  times  the  standard  deviation 
will  be  found  to  include  the  majority  of  the  observations  which 
make  up  the  distribution. 

Coefficient  of  Variation.  Before  we  pass  on  to  illustrate  the 
subject  of  averages  and  variability  by  means  of  a  few  examples 
it  is  necessary  to  introduce  one  more  constant  known  as  the  co- 
efficient of  variation.  It  is  a  measure  of  variabiUty  but  it  differs 
from  the  chief  measures  already  discussed  in  that  they  are  absolute 
measures,  whereas  the  coefficient  of  variation,  written  C.  of  V.  for 
short,  is  a  ratio  or  relative  measure.  The  need  for  it  arises  when 
we  reflect  that  in  order  to  gauge  fairly  the  amount  of  variability  we 
ought  to  have  in  mind  also  the  size  of  the  mean  from  which  the 
variation  is  measured ;  just  as  a  difference  of  1  foot  between  the 
heights  of  two  men  is  a  conspicuous  difference  when  the  normal 
height  is  between  5  and  6  feet,  whereas  the  same  difference  of  1  foot 
between  two  measured  miles  would  be  trifling  because  the  standard 
mile  contains  over  5000  feet. 

The  coefficient  of  variation  has  been  defined  by  Karl  Pearson 
(Phil.  Trans.,  vol.  187a  p.  277),  who  first  suggested  its  use,  as  '  the 
percentage  variation  in  the  mean,  the  standard  deviation  (S.D.) 
being  treated  as  the  total  variation  in  the  mean,'  so  that 

C.  of  V.  =  100  S.D./Mean. 

He  pointed  out  that  it  would  be  idle,  in  dealing  with  the  variation 
of  men  and  women  (or  indeed  very  often  of  the  two  sexes  of  any 
animal),  to  compare  the  absolute  variation  of  the  larger  male  organ 
directly  with  that  of  the  smaller  female  organ,  because  several  of 
these  organs,  as  well  as  the  height,  the  weight,  brain  capacity,  etc., 
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are  greater  in  man  than  in  woman  in  the  approximate  proportion 
of  13  :  12. 

As  an  example  of  the  use  of  the  C.  of  V.,  figures  may  be  quoted 
from  a  paper  by  R.  Pearl  and  F.  J.  Dunbar  {Biometrika,  vol.  ii. 
pp.  321  et  seq.),  On  Variation  and  Correlation  in  Arcella.  Measure- 
ments in  mikrons  were  made  of  the  outer  and  inner  diameters  of 
504  specimens  of  a  shelled  rhizopod  belonging  to  the  group  Imper- 
forata,  family  ArcelUna,  with  the  following  results,  to  two  decimal 
places  : — 


Mean. 

S.D. 

C.  of  V. 

Outer  diameter     . 
Inner         „ 

55-79 
15-91 

5-73 
2-17 

10-27  per  cent. 
13-66       „ 

Thus,  judging  by  the  S.D.  column,  giving  the  absolute  size  of 
deviation,  the  outer  diameter  would  appear  to  be  more  variable 
than  the  inner,  but  the  C.  of  V.  column  shows  that,  if  we  take  the 
sizes  of  the  two  diameters  into  account,  the  inner  is  reaUy  the 
more  variable  of  the  two.  To  turn  aside  the  edge  of  possible  criti- 
cism it  should  be  added  that  the  authors  also  give  the  errors  to 
which  the  above  measures  are  subject,  as  unless  these  are  known 
we  cannot  teU  whether  the  differences  observed  in  variation  are 
significant  or  not  of  a  real  difference  in  fact,  but  that  question 
must  be  left  until  the  theory  of  errors  due  to  sampling  has  been 
developed  in  a  later  chapter. 

The  C.  of  V.  varies  considerably  for  different  characters.  W.  R. 
Macdonell  states  that  *  3  to  5-5  are  representative  values  for  varia- 
bility in  man,  while  in  plants  it  may  run  to  40,'  and  Pearson  and  others 
have  shown  that  for  stature  in  man  it  varies  from  about  3  to  4 
and  for  the  length  of  long  bones  from  4  to  6. 


CHAPTER   VII 

FREQUENCY   DISTRIBUTION  :     EXAMPLES    TO  ILLUSTRATE 
CALCULATING   AND    PLOTTING  :     SKEWNESS 

Calculation  of  Mean  and  Standard  Deviation.  Example  (1). — We 
return  now  to  the  examination  example  in  order  to  show  how  the 
labour  of  calculation  in  finding  the  arithmetic  mean  and  standard 
deviation  of  a  frequency  distribution  may  be  somewhat  lessened. 

The  various  steps  in  the  process  appear  in  Table  (14).  In  the 
first  column  the  marks  at  the  middle  of  each  class-interval  have 
been  written  down,  and  we  make  the  assumption  that  all  the  candi- 
dates in  any  one  class  have  the  same  number  of  marks,  namely,  the 
marks  at  the  middle  of  the  class-interval.  In  any  case  where  the 
number  of  observations  is  large,  and  where  the  class -intervals  are 
reasonably  small,  the  errors  resulting  from  such  an  assumption  will 
be  insignificant,  because  the  individuals  in  each  class  are  just  as 
likely  to  have  values  above  as  below  the  value  at  the  middle  of  the 
class-interval,  and  they  will  therefore  compensate  for  one  another. 

We  now  seek  to  alter  the  scale  of  marking  so  as  to  produce  a 
simpler  set  of  marks  than  the  original,  wliich  will  make  the  work 
of  finding  the  mean  also  simpler,  but  we  must  not  forget  at  the 
end  to  change  back  again  to  the  original  scale.  We  choose  a  number 
from  col.  (1),  somewhere  near  the  required  mean,  to  act  as  a  kind 
of  origin  from  which  to  measure  the  other  numbers  in  the  column. 
This  choice  is  only  a  rough  guess,  and  it  is  really  immaterial  which 
number  is  selected  as  origin,  except  that  the  nearer  it  is  to  the 
mean  the  lighter  will  be  the  calculation  to  follow ;  the  number  33 
has  been  selected  in  this  instance. 

In  col.  (2)  are  written  down  the  deviations  of  the  marks  in  each 
class  from  33,  so  that  now  some  candidates  appear  as  if  they  were 
5,  10,  15  .  .  .  marks  to  the  bad,  and  others  as  if  they  were  5,  10, 
15  ...  to  the  good.  So  long  as  we  remember  to  add  33  at  the 
end  we  can  content  ourselves  therefore  by  finding  the  mean  of  the 
marks  as  given  in  col.  (2).  But  these  again  can  be  further  simplified 
by  dividing  each  candidate's  marks  by  5,  and  we  then  only  need 

62 
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to  find  the  mean  of  the  marks  as  shown  in  col.  (3),  so  long  as  we 
remember  to  multiply  by  5  at  the  first  step  back  to  the  old  scale 
of  marking.  The  addition  of  col.  (5)  makes  it  easy  to  calculate 
this  mean,  for  it  gives  the  result  of  multiplying  each  value  of  the 
variable  (the  number  of  marks  in  each  class)  by  its  appropriate 
weight  (the  number  of  candidates  who  obtained  that  number  of 
marks). 


Table  (14).  Marks  obtained  by  514  Candidates  in  a  certain 
Examination — (Analysis  of  Method  for  Calculating 
Mean  and  Standard  Deviation). 


(1) 

(2) 

(3) 

(4) 

(6) 

(6) 

Marks  on  old 
scale. 

Deviation  of 
Nos.inCol.(l) 

Marks  on 
new  scale. 

Frequency 
of 

Product  of 
Nos.  in 

Product  of 
Nos.  in 

from  33. 

Candidates. 

Cols.  (3)  &  (4). 

Cols.  (3)  &  (5). 

{X) 

(/) 

(/^) 

if^') 

3=33-30 

-30 

-6 

5 

-  30 

180 

8=33-25 

-25 

-5 

9 

-  45 

225 

13  =  33-20 

-20 

-4 

28 

-112 

448 

18  =  33-15 

-15 

-3 

49 

-147 

441 

23=33-10 

-10 

-2 

58 

-116 

232 

28=33-  5 

-  5 

-1 

82 

-  82 

82 

33  =  33 

. . 

. . 

87 

. . 

38=33+  5 

+  5 

+  1 

79 

+  79 

79 

43=33  +  10 

+  10 

+  2 

50 

+  100 

200 

48=33  +  15 

+  15 

+  3 

37 

+  111 

333 

53  =  33  +  20 

+20 

+4 

21 

+  84 

336 

58=33+25 

.    +25 

+  5 

6 

+  30 

150 

63  =  33+30 

+30 

+  6 

3 

+  18 

108 

•• 

514 

-110 

2814 

Thus,  on  this  new  scale,  the  mean  marks  obtained  are 

5(_6)+9(-5)+28(-4)+  .  .  .  +87(0)+  .  .  .  +6(+5)+3(+6) 


514 


-532+422 
614 

-110 


514 
-0-214. 
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This,  then,  is  the  mean  of  the  marks  obtained  by  the  candidates  on  ; 

the  scale  indicated  in  col.  (3).     If  the  marks  are  on  the  scale  given  ■ 
in  col.  (2),  the  mean  is  5(— 0-214),  i.e.  —1-070.     To  bring  them  back 

to  the  original  scale  as  in  col.  (1)  we  must  add  33  to  this  result,  so  ^ 

that  the  required  arithmetic  mean  ^  i 

-33+5(-0-214)  i 

=  33-1-070  ^ 

=31-93.  1 

To  find  the  Standard  Deviation,  or  the  root-mean-square  deviation  i 

from  the  arithmetic  mean,  it  is  convenient  as  before  to  work  with  ■ 

the  simpUfied  scale,  to  measure  the  deviations  from  the  arbitrary  | 

origin  (33)  associated  with  that  scale,  and  to  make  the  necessary  i 

corrections  at  the  end  of  the  work.  j 

Col.   (5)   in  Table   (14)   gives  the  deviation  multiplied   by  the  ■ 
frequency  in  each  class,   the  frequency  denoting  the  number  of 

times  the  particular  deviation  occurs.     Hence,  if  these  numbers  be  j 

multiplied  again  by  the  numbers  in  col.  (3),  we  shaU  have  each  • 

separate  deviation  squared  and  multiplied  by  its  frequency.     The  '. 

results  are  shown  in  col.  (6),  and  they  must  be  added,  and  their  ! 

sum  divided  by  the  sum  of  the  frequencies  (514),  to  give  the  mean-  j 

square  deviation,  which  we  may  represent  by  s^.  ; 

Thus  .  52=2814/514  f^' 

=5-475,  \ 

and  this  is  the  mean-square  deviation  referred  to  33  as  origin. 
We  require  the  corresponding  expression  referred  to  the  mean,      j 
31-93,  as  origin.     If  we  denote  this  by  s^^^  there  is  a  simple  relation      ' 
connecting  the  two,  namely, 


where  x  is  the  deviation  of  the  mean  itself  from  33  [see  Appendix,  \ 

Note  5] ;   of  course  s^^,  s,  and  x  are  all  to  be  measured  on  the  same  i 
scale,  the  simplified  scale  adopted  with  5  marks  as  unit. 

Now  we  have  already  shown  that  the  deviation  of  the  mean  from  :, 
33^—0-214,  and  this  is  therefore  the  value  of  x. 

Hence                           s^2=5-475- (-0-214)2  I 

=5-475-0'046  ] 

=5-429  \ 

=  (2-33)2.  ] 
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And;  returning  to  the  old  scale,  the  standard  deviation,  usually 
denoted  by  a 

=5(2-33) 

=  11-65. 

We  notice  that  3cr=  34-95,  and  this  range  on  either  side  of  the 
mean  amply  takes  in  all  the  observations. 

The  mean  deviation  is  readily  found  from  Table  (14)  by  adding  up 
the  numbers  in  col.  (5)  regardless  of  sign  and  dividing  by  the  sum 
of  frequencies,  514. 

Thus,  on  the  new  scale,  the  mean  deviation 

954 
=  5^TT 


1-856 


which,  on  the  old  scale,  becomes  5(1-856)  or  9-28.  This,  however, 
is  the  mean  deviation  measured  from  33  as  origin,  and  a  correction 
has  to  be  applied  to  get  the  mean  deviation  measured  from  the 
median  or  from  the  mean. 

To  get  the  mean  deviation  from  the  mean  we  note  that  the 
difference  between  the  mean,  31-93,  and  33  is  1-07.  Hence  it 
should  be  clear  from  Table  (14)  that,  by  measuring  from  33  instead 
of  from  31-93,  we  have  made  the  deviations  of  all  the  marks  from 
33  upwards  too  little  by  1-07,  and  we  have  made  the  deviations  of 
all  the  marks  from  28  downwards  too  much  by  1-07.  Hence,  to 
get  the  deviation  required  we  must  add  to  9-28  an  amount 

=  6T4[l-07(87+79+  .  .  .  +3)- 1-07(82+58+  .  .  .  +5)] 

1-07 
=:^(283-231) 
514^  ' 

=  — X52 
514 

=0-108. 

Therefore,  the  mean  deviation  measured  from  the  mean=9-39. 
This  may  be  compared  with  I  (standard  deviation) =9-32. 

Also  the  quartile  deviation  for  this  distribution  has  been  shown 
to  be=8-02,  and  it  may  be  compared  with  §(standard  deviation) 

=  7-77. 

Plotting  of  a  Frequency  Distribution.  The  data  for  the  two 
examples  which  foUow  are  taken  from  the  Quarterly  Return  of 
Marriages,  Births,  and  Deaths,  No.  261,  issued  by  the  Registrar- 
General, 
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The  first  shows  the  proportion  to  population  of  cases  of  infectious 
disease  notified  in  241  large  towns  of  England  and  Wales  for  the 
thirteen  weeks  ended  4th  April  1914.  This  proportion  was  given 
for  each  town  separately  in  the  Return,  but,  in  order  to  bring  out 
the  distinctive  features  of  the  distribution,  the  several  towns  have 


Table  (15).  Proportion  to  Population  of  Cases  of  Infectious 
Disease  notified  in  241  Large  Towns  of  England  and 
Wales  during  the  Thirteen  Weeks  ended  4th  April  1914. 


Case  Rate 
per  1000 
persons 
living. 

Each  dot  below  represents  One  Town  with  Notified  Rate  of  Infectious  Disease 
between  limits  as  given  in  previous  column. 

Total  No. 

of  Towns 

with  given 

Rate. 

0- 

....| 

5 

2— 

....|....|....|....!....|.. ..!... .!.... 

39 

4— 

....!....|....|....i....|....l....|....|....I....!....l...,l....!.... 

69 

6- 

....:....l....i....|.... !....!.... I. ...|. 

41 

8— 

....|....|.. ..!....!....!.... 

29 

10— 

....!....!....!....:.. 

22 

12— 

....'....I....1. 

16 

14— 

7 

16- 

— 

5 

18— 

... 

3 

20- 

— 

4 

22— 

0 

24— 

0 

26— 

• 

1       1 

241 

been,  in  Table  (15),  represented  by  dots  and  put  into  different  classes 
according  to  the  proportion  of  infectious  cases  notified  in  each, 
with  a  separate  line  for  each  class  :  e.g.  ii  the  proportion  for  any 
town  was  5-37  a  dot  was  placed  in  the  line  corresponding  to  the 
class  of  towns  for  which  the  rate  was  '  4  and  less  than  6.'     Every 
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fifth  dot  in  each  line  was  ticked  off,  so  as  to  make  them  easy  to 
count  up  and  also  to  keep  the  lines,  down  the  paper  as  well  as 
across,  straight.  The  frequency,  i.e.  the  number  of  dots  in  each 
class,  was  then  recorded  in  a  column  at  the  extreme  right-hand 
side  of  the  paper. 
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Rate  of  Disease  per  1000  persons  living 

Fiu.  (1). 

It  will  be  at  once  seen  that  this  procedure,  without  calculating 
any  averages,  etc.,  ultimately  gives  to  the  eye  a  very  good  picture 
of  the  distribution,  and  indeed  it  is  the  basis  of  the  graphical  method 
of  studying  statistics.  In  drawing  a  proper  graph  we  use  a  specially 
ruled  sheet  of  paper  which  is  divided  up  into  a  large  number^of 
equal  small  squares  by  *  horizontal '  (cross)  and  '  vertical  '  (up-and- 
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down)  lines.  This  merely  enables  us  to  place  our  dots  accurately 
in  position,  as  shown  in  fig.  (1),  where  the  numbers  0,  5,  10  .  .  . 
have  been  marked  off  along  the  line  Ox  to  correspond  to  '  case 
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Rate  of  Disease  per  1000  persons  living 
Fig.  (2). 

rates  '  of  these  magnitudes  :  thus  rates  of  '  4  and  less  than  6  ' 
were  recorded  by  69  successive  dots  along  a  vertical  line  at  a  dis- 
tance 5  (the  centre  of  the  class-interval  4-6)  from  the  axis  Oy. 
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The  final  configuration  in  fig.  (1),  when  turned  half  round,  is 
exactly  the  same  as  that  of  Table  (15).     If  desired  the  frequency 
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may  be  recorded,  dot  by  dot,  on  a  side  piece  of  paper  and  then 
only  the  topmost  dot  in  each  class  need  be  marked  on  the  graph 
sheet.    In  order,  however,  to  enable  the  eye  to  measure  the  height 
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X)f  each  frequency  in  relation  to  the  rest,  it  is  advisable  in  that 
case  to  connect  up  adjacent  dots  as  in  fig.  (2)  or  as  in  fig.  (3). 

The  last  method  of  representation  (fig.  (3)),  to  which  the  name 
histogram  has  been  given  by  Professor  Karl  Pearson,  is  particularly 
useful  and  should  be  carefully  studied.  It  is  formed  in  this  case  by 
erecting  a  succession  of  rectangles  with  the  lines  02,  24,  46  .  .  . 
along  Oa;  as  their  bases,  corresponding  to  the  successive  classes  of 
the  given  distribution,  and  with  heights  proportional  to  the  fre- 
quencies proper  to  those  classes.  It  is  not  necessary  to  complete 
the  sides  of  the  rectangles,  but,  if  they  were  completed,  each  would 
enclose  a  number  of  squares  proportional  to  the  frequency  of  towns 
with  the  rate  of  disease  defined  by  its  base  :  e.g.  the  first  rectangle 
would  enclose  10  squares,  the  second  78,  the  third  138,  and  so  on, 
numbers  respectively  proportional  to  5,  39,  69,  and  so  on.  It 
follows  that  the  total  area  enclosed  between  the  histogram  and  the 
axis  Ox  is  proportional  to  the  aggregate  frequency  of  towns  observed. 

Now  we  might  conceive  a  step  further  taken  and  a  smoothed 
curve  drawn  freehand  so  as  to  agree  as  closely  as  possible  with 
fig.  (2)  or  fig.  (3),  but  with  all  the  sharp  corners  smoothed  out,  and 
so  nicely  adjusted  as  to  make  the  area  enclosed  between  the  curve, 
the  axis  Ox,  and  lines  parallel  to  Oy  defining  the  limits  of  any  class, 
proportional  to  the  frequency  of  towns  in  that  class.  To  this 
fig.  (2)  and  fig.  (3)  might  be  regarded  as  approximating  if  only  a 
sufficient  number  of  observations  were  recorded,  and  only  in  that 
case  would  it  be  possible  to  draw  it  with  any  accuracy.  Such  a 
curve  is  called  a  frequency  curve,  measuring  as  it  does  the  frequency 
of  the  observations  in  different  classes. 

[Assuming  that  corresponding  to  a  given  frequency  distribution  a  curve 
of  this  kind  does  really  exist — and  the  assumption  turns  upon  the  frequency 
being  continuous — the  reader  who  is  acquainted  with  the  notation  of  the 
Calculus  will  recognise  that,  if  {x,  y)  represents  any  point  on  the  curve,  ybx 
measures  the  frequency  of  observations  or  measurements  of  an  organ  or 
character  lying  between  the  values  x  and  {x-\-bx),  when  the  total  frequency 
comprises  a  large  number  of  observations,  say  500  to  1000. 

Further,  it  will  appear  later  that  the  mean,  the  median,  and  the  mode 
have  a  geometrical  interpretation  of  no  small  importance  associated  with  the 
curve. 

The  mean  x  corresponds  to  the  particular  ordinate  y  which  passes  through 
the  centroid  or  centre  of  gravity  of  the  area  between  the  frequency  curve 
and  axis  (ix,  because 

the  mean=  J ^         2{x.y8x)/J^         My^x), 

where  the  summation  extends  throughout  the  distribution, 

=jxydx/jydx 
where  the  integral  extends  throughout  the  curve. 


FREQUENCY   DISTRIBUTION 


61 


The  median  x  corresponds  to  the  ordinate  y  which  bisects  this  same  area ; 
e.g.  in  fig.  (3),  the  number  of  small  squares  on  either  side  of  the  median  in  the 
space  bounded  by  the  histogram  and  the  axis  represents  half  the  total  number 
of  observations,  two  small  squares  corresponding  to  each  observation. 

The  mode  «  corresponds  to  the  maximum  ordinate  of  the  curve,  measuring 
the  greatest  frequentsy  in  the  whole  distribution.] 

Skewness.  There  is  one  feature  of  a  frequency  distribution  which 
catches  the  eye  sooner  almost  than  any  other,  and  that  is  its  sym- 
metry or  lack  of  symmetry.  It  is  important  therefore  that  we 
should  have  some  means  of  measuring  it. 

In  a  symmetrical  distribution  the  mean,  mode,  and  median 
coincide,  and  we  have,  as  it  were,  a  perfect  balance  between  the 
frequency  of  observations  on  either  side  of  the  mode  or  ordinate  of 
maximum  frequency.  In  a  skew  distribution  the  centre  of  gravity 
is  displaced  and  the  balance  thrown  to  one  side  :  the  amount  of  this 
displacement  measures  the  skewness.  But  there  is  another  factor 
to  be  taken  into  account,  for  when  the  variability  of  the  distribu- 
tion is  great  the  balance  is  more  sensitive  than  when  it  is  small, 
and  the  difference  between  mean  and  mode  is  consequently  more 
pronounced  though  it  may  not  be  significant  of  any  greater  skew- 
ness. This  will  be  clear  in  the  light  of  the  analogy  of  the  swing 
of  a  pendulum.  If  OPP'  denote  the  pendulum  in  the  accompanying 
figure,  OAA'  its  mean  position,  and  OBB'  an  extreme  position,  the 
displacement  in  the  position  OPP'  from  the  mean,  if  measured 
along  the  scale  AB,  is  AP, 
and,  if  measured  along  the 
scale  A'B',  is  AT'.  But, 
since  the  amount  of  swing 
in  either  case  is  the  same, 
it  would  be  more  appropri- 
ate to  write  the  linear  dis- 
placement as  a  fraction  of 
the  full  swing  so  as  to  make 
these  two  measures  also  the 
same,  thus 

AP/AB=A'P'/A'B'. 

So,  in  the  case  of  a  fre- 
quency distribution,  Profes- 
sor Karl  Pearson  has  suggested  as  a  suitable  measure  for  skewness, 
not  the  difference  between  mean  and  mode,  but  the  ratio  of  this 
difference  to  the  variability.     Thus 

skewness—  (mean— mode)  I  S.D. 
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or,  approximately, 

=3(mean— median)/S.D.  (see  p.  39), 

a  form  which  is  sometimes  useful. 

According  to  this  convention  the  skewness  is  regarded  as  positive 


Skewness  + 


Skewness  — 


Mode     Mean 


Mean     Mode 


X  increasing 


X  increasing 


when  the  mean  is  greater  than  the  mode,  and  as  negative  when 
the  mode  is  greater  than  the  mean. 

Illustrations  of  frequency  curves,  with  the  position  of  mode  and 
mean  marked,  will  be  found  in  Chapter  xvii. 

We  proceed  to  the  detailed  calculations  necessary  in  the  infectious 
diseases  example. 


Table  (16).  Proportion  to  Population  or  Cases  of  Infectious 
Disease  notified  in  241  Large  Towns  of  England  and 
Wales  during  the  Thirteen  Weeks  ended  4th  April  1914. 


(1) 

(2) 

(3) 

(4) 

(5) 

Case  Rate  per 
1000  persons  living. 

Deviation 
from  7. 

Frequency  of 
Towns  with 
given  Rate. 

Product  of    . 
Nos.  in 
Cols.  (2)  &  (3). 

Product  of 

Nos.  in 

Cols.  (2)  &  (4). 

0  and  less  than    2 

[x) 
-  3 

(/) 
5     ^ 

-15 

{fx^) 
45 

2 

,         „        4 

-  2 

39    u. 

-78 

156 

4 

6 

-   1 

69    »■> 

-69 

69 

6 

8 

41 

8 

„       10 

+   1 

29 

+  29 

29 

10 

»       12 

+  2 

22 

+44 

88 

12 

„       14 

+  3 

16 

+48 

144 

14 

„      16 

+  4 

7 

+28 

112 

16 

„       18 

+  5 

5 

+25 

125 

18 

„       20 

+  6 

3 

+  18 

108 

20 

„       22 

+  7 

4 

+  28 

196 

26 

„       28 

+  10 

1 

+  10 

100 

•• 

•• 

241 

+68 

1172 
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Example  (2). — ^The  various  averages  and  measures  of  variability 
of  the  distribution  can  be  calculated  just  as  in  the  case  of  the  last 
example,  and  the  data  required  to  determine  the  mean  and  the 
standard  deviation  are  set  out  in  Table  (16).  We  can  afford  now 
to  miss  out  some  of  the  more  obvious  steps  in  explanation. 

On  the  scale  of  col.  (2),  where  a  difference  of  2  in  the  case  rate, 
per  1000  persons  living,  is  the  unit  and  where  a  case  rate  of  7  is 
taken  as  origin,  the  mean,  by  the  result  of  col.  (4) 

68 
="2TT 

=0-282. 
Hence,  on  the  original  scale,  the  mean 

=7+2(0-282) 
=7-564. 

Again,  the  mean-square  deviation,  on  the  scale  of  col.  (2),  measured 
from  7  as  origin  is 


=4-863  ; 

and  X,  the  deviation  of  the  mean  from  7  as  origin,  on  the  scale  of 
col.  (2) =0-282.  Thus  the  mean-square  deviation  measured  from 
the  mean, 

=4-863- (0-282)2 
=4-783. 

Therefore,  the  standard  deviation  a,  on  the  original  scale 

=2V4^78^ 
=4-374. 

Since  3or=  13-122,  the  range  '  (mean— 3o-)  to  (mean+3o-)  '  includes 
all  but  one  or  two  observations. 

To  determine  the  median,  we  conceive  the  towns  ranged  in  order 
according  to  the  proportion  of  infectious  cases  notified  in  each, 
from  the  least  to  the  greatest,  and  the  town  with  the  median  rate 
is  the  121st  from  either  end. 

But  the  113th  town  has  a  notified  case  rate  of  approximately  6 
per  1000,  and  the  154th  town  has  a  notified  case  rate  of  approxi- 
mately 8  per  1000. 

Thus  a  difference  of  41  towns  corresponds  to  a  difference  of  2  in 
the  rate,  hence  a  difference  of  8  towns  corresponds  to  a  difference 
of  0-39  in  the  rate  ;  therefore  the  median  ra<e= 6-39  approximately. 

By  referring  to  the  original  records  and  writing  down  the  rate 
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for  each  town  in  the  group  '  rate  6  and  less  than  8  '  in  which  the 
median  lay,  the  accurate  value  of  the  median  turned  out  to  be  6-30. 
The  lower  quartile  or  case  rate  of  the  imaginary  town,  No.  J(241), 
or  60-25,  one-quarter  way  along  the  ordered  sequence  of  towns,  is 
readily  shown  to  be  447,  and  the  upper  quartile  or  case  rate  of 
town  No.  i(241),  or  180-75,  is  9-84. 
Hence  the  quartile  deviation 

=i(9-84-4-47) 
=269. 

With  this  may  be  compared  |(S.D.)=f(4-37)=2-92. 
Again,  the  mean  deviation  measured  from  7 
=2(111) 
=3-253. 

Measured  from  the  mean,  it  becomes 

=3-253+       -[(41+69+39+5)-(29+22+16+7+5+3+4-fl)] 
241 

=  3-253+ (0-564)(67)/241 

=3-41 

and  this  may  be  compared  with  i(S.D.)= 1(4-374) =3-50. 

If  we  estimate  the  mode  by  inspection  of  the  frequency  graphs  in 

figs.  (2)  and  (3),  we  should  say  it  comes  between  5  and  6  ;  supposing 

we  call  it  5-5,  very  roughly. 

In  this  case,  taking  the  values  actually  calculated  for  mean  and 

median, 

(mean— mode)=7-56— 5-50 

=2-06, 

and  3(mean—median)= 3(7-56— 6-39) 

=3(1-17) 

=  3-51  ; 

so  that  the  rule 

(mean— mode) = 3(mean— median) 

is  far  from  being  true  according  to  these  results  ;  this  is  partly  due, 
of  course,  to  the  very  unsymmetrical  character  of  the  distribution. 

The  relative  positions  of  the  mean,  median,  and  modal  points 
as  calculated  are  indicated  in  figs.  (2)  and  (3)  by  three  fines  drawn 
paraUel  to  Oy  through  these  points  to  meet  the  graph. 

Finally,        5A;ew;iie55=(mean— mode)/S.D.=2-06/4-37=0-47. 

Example  3. — The  next  example  deals  with  the  deaths  of  infants 
under  one  year,  out  of  every  thousand  born,  in  100  great  towns  in 
the  United  Kingdom  during  the  thirteen  weeks  ended  4th  April  1914. 
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The  details  of  the  calculation  may  be  left  in  this  case  to  the  reader, 
who  is  recommended  to  follow  the  method  shown  in  the  last  example 
so  far  as  possible  throughout,  including  the  plotting  of  the  distribu- 
tion in  different  ways.     The  statistics  are  as  follows  : — 

Table  (17).  Death  Rate  of  Infants  under  1  Year 
PER  1000  Births. 

(1)  (2)  (3)  (4) 


No.  of  Towns 

No.  of  Towns 

Death  Rate. 

with  Death  Rate 

Death  Rate. 

with  Death  Rate 

as  in  Col.  (1). 

as  in  Col.  (3). 

30  and  under  40 

1 

120  and  under  130 

16 

50        „            60 

3 

130          „        140 

11 

60 

70 

2 

140          „         150 

10 

70 

80 

6 

150          „         160 

8 

80 

90 

7 

160          „        170 

3 

90 

100 

6 

170          „        180 

1 

100 

110 

11 

200          „        210 

1 

110 

120 

13 

240          „        250 

1          * 

The  more  important  results  are  : — 
Arithmetic  mean=  118-9;   S.D.  =  32-2  ; 

median=  120-9  ;   quartile  deviation ^ 


19-5. 


Example  (4). — As  another  example  corresponding  details  may  be 
worked  out  for  the  following  temperature  records  J^ken  at  noon 
at  a  certain  spot  in  Chester  week  by  week  during  a  period  o^me 
covering  five  years,  the  results  in  this  case  being  : — 
mean=55-10;    S.D.=10-33  ; 
median= 54-88  "f  quartile  deviation =7 -94 


Table  (18).  257  Weekly  Records  of  Temperature  (Fahrenheit). 

(1)  (2)  (3)  (4) 


Temperature 

No.  of  Records 

Temperature 

No.  of  Records 

Limits  in 

between  Limits 

Limits  in 

between  Limits 

Degrees. 

shown  in  Col.  (1) 

Degrees. 

showninCol.(3) 

25-5-29-5 

I 

53-5-57-5 

30-5 

29-5-33-5 

1 

57-5-61-5 

31-5 

33-5-37-5 

9 

1      61-5-65-5 

30 

37-5-41-5 

11-5 

1       65-5-69-5 

26 

41-5-45-5 

28 

69-5-73-5 

13-6 

45-5-49-5 

31-5 

73-5-77-5 

4 

49-5-53-5 

36-5 

77-5-81-5 

3 

U  i\^ 
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Before  closing  the  chapter  a  shghtly  different  manner  of  graphing 
the  statistics  is  worth  noticing,  as  it  provides  us  with  a  fairly  quick 
though  rough  alternative  method  of  determining  the  mode  and 
median. 

Take,  for  example,  the  examination  marks  data  which  for  this 
purpose  must  first  be  thrown  into  the  second  form  shown  below 
Table  (7).  We  mark  off  on  some  convenient  scale  along  OX  dis- 
tances 5,  10,  15,  20  ...  65  from  O  to  represent  these  numbers 
of  marks  respectively,  and  at  the  points  obtained  we  erect  lines 
parallel  to  OY  of  lengths  5,  14,  42,  91  .  .  .  514  to  represent  the 
numbers  of  candidates  who  obtained  not  more  than  5,  10,  15,  20 
...  65  marks  respectively.  A  freehand  curve  is  then  drawn 
through  the  summits  of  these  lines  in  the  manner  indicated  in 
fig.  (4),  starting  from  a  height  5  and  rising  to  a  height  514  above 
the  axis  OX. 

By  means  of  this  curve  we  can  approximately  state  at  once  how 
many  candidates  obtained  any  given  number  of  marks  or  less. 
Suppose,  for  example,  we  wish  to  know  how  many  candidates 
obtained  22  marks  or  less,  we  have  only  to  measure  off  a  distance 
22  from  0,  represented  by  ON,  and  erect  a  perpendicular  NP  to 
meet  the  curve  at  P.  Since  NP=110  we  infer  from  the  manner  in 
which  the  curve  has  been  formed  that  110  candidafes  obtained 
22  marks  or  less,  so  that,  incidentally,  the  110th  candidate  from 
the  bottom  must  have  obtained  approximately  22  marks.  This 
suggests  that  by  working  backwards  we  can  also  read  off  roughly 
the^umber  of  marks  gained  by  any  particular  candidate  when  his 
order  in  the  Hst  is  known.  Thus,  to  find  the  median,  i.e.  the  marks 
due  to  candidate  No.  257-5,  we  merely  draw  a  line  parallel  to  OX 
at  a  height  257-5  above  it  and  the  portion  of  this  line  cut  off  between 
the  curve  and  OY  measures  the  median.  The  value  given  by  this 
method  is  approximately  31-5.  Similarly  the  quartiles  are  found 
by  drawing  lines  parallel  to  OX  at  heights  128-5  and  385-5  above 
it  with  results  about  23-3  and  39-2  respectively. 

Again,  as  we  gradually  increase  the  number  of  marks,  the  number 
of  candidates  getting  that  number  of  marks  or  less  must  increase 
also,  but  the  rate  of  this  second  increase  is  variable.  The  reader 
will  perceive  that  where  the  height  above  OX  changes  slowly  the 
gradient  of  the  curve  is  small,  but  where  it  changes  by  big  steps 
the  gradient  is  steep,  and  it  is  at  its  steepest  just  in  the  neighbour- 
hood where  the  greatest  addition  is  being  made  to  the  height  as 
the  marks  increase,  i.e.  where  the  frequency  of  additional  candi- 
dates is  at  its  greatest,  so  determining  the  mode  :   this  should  be 
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clear  on  a  comparison  of  the  two  arrangements  of  the  data  in  and 
below  Table  (7).  By  sliding  a  straight-edge  along  the  contour  of 
the  curve  we  can  estimate  approximately  where  the  curve  is 
steepest,  for  at  this  point  the  direction  of  turning  of  the  ruler  or 
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Fig.  (4).  Graph  showing  the  Number  of  Candidates  who  obtained  not 
more  than  any  given  Number  of  Marks. 


straight-edge  must  change.     This  gives  for  the  mode  a  value  in  the 
neignbourhood  of  32. 

It  might  be  advisable  to  treat  the  other  examples  by  this  method 
also,  so  as  to  compare  results. 
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From  the  mathematical  point  of  view  graphs  may  be  regarded  as 
the  alphabet  of  Algebraical  Geometry. 

We  can  locate  a  point  in  a  plane,  relative  to  two  perpendicular 
lines  or  axes  as  they  are  called,  OX,  OY,  which  serve  as  boundaries 

of  measurement,  when  we  know  y  and  a;, 
its  shortest  distances  from  these  boun- 
daries. This  fact  serves  to  connect  up 
Geometry,  in  which  points  are  elements, 

with    Algebra,   in    which   a:'s    and    t/'s, 

X  standing  always  for  numbers,  are  ele- 
ments. The  names  abscissa  (ah — ^from, 
and  scindo — I  cut)  and  ordinate  are  given  to  x  and  y,  or,  when  we 
refer  to  them  together,  they  may  be  spoken  of  as  the  co-ordinates  of  P. 
The  celebrated  French  philosopher,  Descartes  (1596-1650),  was 
the  founder  of  Cartesian  Geometry,  and  if  we  may  venture  to  com- 
press the  essence  of  his  system  into  a  single  statement,  it  is  this — 
When  a  point  P  is  free  to  take  up  any  position  in  a  given  plane, 
its  X  and  y  are  quite  independent :  they  may  be  allotted  any  values 
irrespective  of  one  another.  Suppose,  however,  that  P  is  constrained 
to  lie  somewhere  on  an  assigned 
curve,  such  as  APB  in  the  figure, 
then  X  and  y  are  no  longer  inde- 
pendent, for,  so  soon  as  x  is  fixed, 
y  is  fixed  also  ;  it  follows  that  in 
this  case  some  relation,  algebraical 
or  otherwise,  such  as  y=x^—2x-{-'l, 
must  exist  between  x  and  y,  and  the  relation  may  be  called  the 
equation  of  the  curve  which  gives  rise  to  it. 

Now,  if  to  every  curve  there  corresponds  in  this  way  some 
equation  and  to  every  equation  some  curve,  it  seems  likely  that  the 
simpler  the  curve  the  simpler  will  be  the  corresponding  equation, 
and  vice  versa.     In  fact,  the  student  who  does  not  know  it  already 
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need  only  refer  to  the  most  elementary  treatise  on  graphs  to  find 
that  every  equation  of  the  first  degree  in  x  and  y,  i.e.  one  which  does 
not  involve  any  x^,  y^,  xy,  or  higher  powers,  represents  some  straight 
line.     Any  such  equation,  e.g. 

x-3y-{-l2=0, 
can   be   at    once    thrown    into 
either  the  form 


(1) 


12     4 


where  — 12  and  4  are  intercepts 
made  by  the  line  on  the  axes 
OX  and  OY  ;   or 

(2)  2/=ia:+4, 

where  J,  i.e.  1  in  3,  is  the  measure  of  its  gradient  and  4  the  height 
above  the  origin  at  which  it  cuts  the  axis  OY. 

Further,  every  equation  of  the  second  degree  in  x  and  y,  which 
may  involve  x^,  y^,  and  xy,  but  no  higher  powers,  represents  geo- 
metrically some  conic,  a  family  of  curves  comprising  the  parabola, 
the  ellipse,  and  the  hyperbola,  with  the  circle  and  two  straight 
lines  as  particular  cases.  The  earth  and  other  planets,  likewise 
comets,  in  their  journeys  through  space  travel  along  curves  belonging 
to  the  same  family,  one  of  ancient  and  historical  connections. 

These  conies  need  not,  however,  detain  us,  and  we  pass  on  at 
once  to  an  example  of  a  cubic  graph  to  show  how  a  very  little 

knowledge  of  the  theory  may  be  put 
to  some  practical  use.  Suppose  a 
box  manufacturer  has  a  large  number 
of  rectangular  sheets  of  cardboard, 
3  ft.  long  by  2  ft.  broad,  and  he 
wishes  to  make  open  boxes  with  them 
by  cutting  a  square  piece  of  the  same 
size  out  of  each  corner  and  turning 

[The  shaded  flaps  are  bent  upwards     up  the  flaps   that  are  left.      How   big 
along  the  dotted  lines.]  ,        ,  ,     .  ,      .|.     i  .     .     u.     i. 

should  the  squares  be  if  this  is  to  be 
done  with  as  little  waste  as  possible  ?  Clearly  this  is  commercially 
an  important  type  of  problem  to  solve. 

Let  us  denote  a  side  of  the  square  to  be  cut  out  of  each  corner 
by  X  feet.     Then  the  bottom  of  the  required  box  will  have  dimensions 

(3-2a;)  ft.  by  (2-2x)  ft. 

and  its  depth  will  be  x  ft. 
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Hence  the  capacity  of  the  box  when  completed  will  be 

a:(3-2a;)(2— 2a;)  cu.  ft., 

and  he  makes  best  use  of  the  material  who  produces  the  most 
capacious  box.  Call  this  expression  y  and  let  us  find  the  values 
of  y  corresponding  to  different  values  of  x  so  as  to  be  able  to  draw 
roughly  the  curve  of  which  the  equation  is 

y=^x(Z-2x)(2-2x)     .         .  .     (1) 


Table  (19).  Table  of  Corresponding  Values  of  x  and  y 
IN  the  Curve  y=x{3—2x){2—2x). 


X 

2x 

(3 -2a:) 

(2 -2a;) 

a;(3-2ic)(2-2x) 

y 

-1 

-2 

5 

4 

-20 

-20 

:| . 

-1 

4 

3 

-    6 

-   6 

-\ 

I 

f 

-M 

-   219 

0 

0 

3 

2 

0 

0 

+t 

+  h 

1 

f 

+  M 

+  0-94 

+  * 

+  1 

2 

1 

+    1 

+    1 

+  1 

+  f 

1 

i 

+  A 

+   0-56 

+  1 

+  2 

1 

0 

0 

0 

+li 

+  # 

i 

-i 

-^ 

-   0-31 

+li 

+  3 

0 

-1 

0 

0 

+  2 

+  4 

-1 

-2 

+   4 

+   4 

+  2i 

+  6 

-2 

-3 

+  15 

+  15 

0-2 

0-4 

2-6 

1-6 

(0.2)(2.6)(1.6) 

0-83 

0-4 

0-8 

2-2 

1-2 

(0.4)(2.2)(l-2) 

1-06 

0-6 

12 

1-8 

0-8 

(0.6)(1.8)(0.8) 

0-86 

0-8 

1-6 

1-4 

0-4 

(0.8)(1.4)(0.4) 

0-45 

0-38 

0-76 

2-24 

1-24 

(0-38)(2-24)(1.24) 

1055 

0-39 

0-78 

2-22 

1-22 

(0.39)(2.22)(1.22) 

1056 

0-40 

0-80 

2.20 

1-20 

(0.40)(2.20)(1.20) 

1-056 

0-41 

0-82 

2-18 

M8 

(0-41)(2-18)(1.18) 

1-055 

We  get  a  tolerably  good  idea  of  the  shape  of  the  curve  by  plotting 
the  points  (x,  y)  shown  in  Table  (19)  from  x—  —  \  to  x—-\-2  as  in 
fig.  (5).  It  is  simply  a  matter  of  practice  to  be  able  to  determine 
the  whole  curve  from  a  few  points  in  this  way,  and  the  greater  the 
number  of  points  plotted  the  more  accurately  will  it  be  possible 
to  draw  the  curve.  It  should  be  noticed  that  the  points  for  which 
^=0  are  in  a  sense  key-points  to  the  curve :    they  are  readily 
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found  by  making  the  factors  separately  zero  in  the  right-hand  side 
of  equation  (1),  namely  x=0,  3— 2ic=0,  and  2— 2ic=0,  and  by 
plotting  them  first  they  serve  as  a  guide  to  the  position  of  points 
subsequently  plotted. 

We  want  to  laiow  for  what  value  of  x  the  capacity  of  the  box,  t/, 
is  greatest  and  the  preliminary  plotting  is  enough  to  indicate  a 
maximum  value  for  y  between  x=0  and  x=l,  for  the  curve  first 
rises  and  then  falls  between  these  two  limits.  In  order  to  discover 
more  exactly  where  the  maximum  is  located  we  therefore  plot 
in  addition  the  points  corresponding  to  x=0-2,  0-4,  0-6,  0-8  respec- 
tively, and  this  is  done  on  a  larger  scale  than  that  used  in  the 
first  diagram  because  the  accuracy  is  thereby  increased  (see  fig.  (5) 
inset). 

The  calculations  and  figure  suggest  that  the  maximum  required 
is  very  near  the  point  for  which  a;=0-4,  so  we  next  work  out  values 
of  y  in  this  neighbourhood,  corresponding,  say,  to  a;=0-38,  0-39, 
0-40,  0-41,  with  the  results  shown  at  the  foot  of  Table  (19).  From 
these  we  conclude  that  to  a  fair  degree  of  accuracy  the  maximum 
value  of  y  is  given  by  taking  a:=0-395.  It  would  be  possible  in 
the  same  way  to  calculate  more  decimal  places,  but  we  have  gone 
far  enough  to  make  the  method  clear. 

Hence  the  side  of  each  square  cut  out  should  be  of  length 

0-395  ft.,  or  4|  in. 

Whenever  the  value  of  one  variable,  y,  depends  upon  that  of 
another  variable,  x,  in  such  a  way  that  when  x  is  given  y  is  known, 
so  that  y  may  be  termed  a  function  of  x,  corresponding  values  of 
X  and  y  can  be  plotted — as  was  done  in  the  example  just  discussed — 
and  a  curve  drawn  by  joining  up  the  points  obtained,  the  relation 
which  connects  x  and  y  being  the  equation  of  this  curve.  More- 
over, it  is  possible,  by  calculating  enough  points  from  the  equation 
and  plotting  them,  to  get  the  curve  as  accurately  as  we  please. 

In  Statistics,  however,  we  usually  have  to  start  the  other  way 
round  and  reach  the  equation,  if  at  all,  last.  We  make  observations 
of  two  sets  of  variables,  a  set  of  x's,  and  a  set  of  y's,  one  of  which 
is  dependent  in  some  way  upon  the  other — e.g.  y,  the  dependent 
variable,  might  denote  the  number  of  individuals  observed  to  have 
a  certain  organ  of  length  x,  the  independent  variable — and  thus 
we  get  pairs  of  corresponding  values  like  {x^^,  y^),  {x^,  y^),  {x^,  2/3)  ••  • 
We  met  with  examples  of  this  method  of  recording  results  in  the 
last  chapter,  and  we  need  only  repeat  here  that  its  chief  virtue  is 
suggested  in  the  root  of  the  word  itself — it  is  more  graphic  than  a 
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long  table  of  figures  and,  by  means  of  it,  many  of  the  essential 
features  of  a  problem  are  immediately  seized  upon. 

Now  for  some  purposes  it  may  be  necessary  to  go  further  and 
to  find  what  curve  would  best  fit  the  points  plotted,  assuming  they 
were  numerous  enough,  and  what  equation  between  x  and  y  would 
best  describe  the  curve.  But  the  graphs  we  meet  in  Statistics, 
bearing,  for  instance,  upon  sociological  or  biological  problems,  are 
in  general  much  more  wayward  than  the  mathematical  kind  we 
have  referred  to  in  the  present  chapter  :  it  is  impossible  to  set 
down  simple  equations  to  which  they  can  be  rigidly  confined,  and 
when  we  are  unable  to  find  any  relation  which  accurately  and 
uniquely  defines  2/  as  a  function  of  x  we  must  rest  satisfied  with  the 
most  manageable  equation  and  the  best  fit  we  can  get. 

In  sciences  such  as  Engineering  and  Physics  it  is  often  possible 
to  fix  upon  two  mutually  dependent  variables,  x  and  «/,  and  to 
observe  enough  corresponding  values  of  each  to  enable  us  to  draw 
a  graph  which  answers  very  closely  to  the  true  relationship  between 
them,  so  that  a  connecting  equation  can  be  determined ;  e.g.  we 
may  plot  the  amount  of  elastic  stretch,  y,  in  a  wire  when  different 
weights,  X,  are  hung  from  the  end  of  it,  and  it  is  found  that  y  is 
directly  proportional  to  a:.  If  we  deal  in  this  way  with  some 
simple  figures  which  are  amenable  to  our  purpose  it  may  help  to 
make  clear  the  nature  of  the  same  problem  in  Statistics. 

The  following  corresponding  values  of  x  and  y  were  given  in  a 
Board  of  Education  Examination  (1911)  : — 

a;=l-00,  1-50,  200,  2-30,  2-50,  2-70,  2-80  ; 
2/=0-77,  105,  1-50,  1-77,  2-03,  2-25,  2-42. 

Allowing  for  errors  of  observation,  it  was  desired  to  test  if  there 
was  a  relation  between  y  and  x  of  the  type 

y^a^hx'^     .  .  .     (1) 

In  the  first  place,  the  shape  of  the  curve  obtained  by  plotting 
y  against  x,  as  in  fig.  (6),  would,  to  the  initiated,  probably  suggest 
a  parabola,  the  equation  of  which  is  of  type  (1).  In  order  to  test 
its  suitability  we  proceed  to  plot  y  against  x^,  or,  putting  x'^=^,  we 
plot  y  against  f .     If  equation  (1)  holds,  then,  in  that  case 

2/=a+6^     .         .         .     (2) 

should  also  hold,  and  this,  in  (f ,  y)  co-ordinates,  represents  a  straight 
line.  The  result  of  plotting  y  against  f  should  therefore  be  a 
number  of  points  approximately  in  a  straight  line — we  say  *  ap- 
proximately '  to  allow  for  errors  of  observation  in  the  original  data. 
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Now  from  the  given  statistics  corresponding  values  of  ^  and  y 
are,  since  f =a:2  : — 

^=1-00,  2-25,  4-00,  5-29,  6-25,  7-29,  7-84  ; 
2/=0-77,  1-05,  1-50,  1-77,  2-03,  2-25,  2-42  ; 
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and  the  resulting  graph,  fig.  (7),  is  very  approximately  a  straight 
line.  To  determine  its  equation,  choose  two  points  (not  too  close 
together)  on  the  line,  which  has  been  drawn  so  as  to  run  as  fairly 
as  possible  through  the  middle  of  the  points  plotted,  and,  in  choosing, 
take  points  which  lie  at  the  intersections  of  horizontal  and  vertical 
cross  lines  (the  printed  lines  of  the  graph  paper)  if  such  can  be 

Y 


4  5 

Fig.  (7). 


found,  because  their  x's  and  2/'s  can  be  read  off  with  ease  and 
accuracy.     Two  such  points  are 

(2-8,  1-2)  and  (6*0,  2-0), 
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and  since  each  of  these  pomts  lies  on  the  line  whose  equation  is 

we  have 

l-2=:a+6(2-8) 
2-0=a+6(6-0). 

Subtracting,  we  get 

0-8=6(3-2). 
Therefore  6=i. 

Hence  az=2  — 2  =  J. 

Thus  the  equation  of  the  line  is 

t.g.  4t/=f+2, 

and  the  law  connecting  x  and  y  is  therefore 

4i/=a;2-|-2. 

The  following  statistics,  the  result  of  an  experiment  in  Physics 
to  verify  Boyle's  Law,  may  be  treated  in  the  same  way.  a;  is  a 
number  proportional  to  the  volume  of  a  constant  weight  of  gas  in  a 
closed  space,  and  ?/  is  a  number  proportional  to  its  absolute  pressure. 
Corresponding  values  of  x  and  y  observed  were  : — 

\x=  46-89    41-96    40-33    38-88    37-37     36-06    34-71     33-47 
[y=  76-32     85-38     88-93     92-36     96-09     99-61  103-51  107-51 

{x=  32-39    31-08    29-97    28-76    27-26    25-32    24-04 
\y=n\m  115-69  120-05  125-08  131-99  142-09  149-81. 

Boyle's  Law  states  that  the  product  xy  is  constant,  and  this  may  be 

tested  by  putting  ^=  -  and  plotting  y  against  | ;  the  points  obtained 
x 

should  be  approximately  in  a  straight  line. 

Now  in  Statistics,  as  we  have  already  explained,  the  exact  con- 
nection between  the  variables,  x  and  y,  is  rarely  so  clear,  though 
the  absence  of  law  is  not  so  complete  as  it  might  seem  at  first  sight. 
At  this  stage,  however,  we  need  not  enter  into  the  difficult  question 
of  curve  fitting  :  if  drawn  with  care  and  used  with  judgment  much 
that  is  of  value  may  be  learnt  by  simple  plotting  and  by  connecting 
up  the  resulting  points  by  straight  lines  or  a  freehand  ciu-ve.  We 
shall  briefly  explain  or  illustrate  by  examples  how  graphs  and 
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graphical  ideas  may  be  used  to  serve  three  distinct  purposes, 
namely : — 

(1)  to  suggest  correlation  or  connection  between  two  different 

factors  or  events  ; 

(2)  to  supply  a  basis  for  finding  by  interpolation  some  values  of  a 

variable  when  others  are  known  ; 

(3)  as  pictorial  arguments  appealing  to  the  reason  through  the  eye. 

We  reserve  (2)  and  (3)  for  the  next  chapter  and  proceed  at  present 
with  an  example  of  (1). 

Correlation  suggested  by  Graphical  means.  Consider  the  index 
numbers,  col.  (2)  Table  (20),  showing  the  variation  from  year  to 
year  in  wholesale  prices  between  the  years  1871  and  1912.  It  is 
not  an  easy  matter  to  take  in  satisfactorily  the  meaning  of  such  a 
mass  of  bare  figures,  but  they  are  much  easier  to  grasp  when  plotted 
in  a  graph. 

In  this  case  the  numbers  x,  representing  years,  and  the  numbers  ?/, 
representing  prices,  are  measures  of  things  of  quite  a  different  char- 
acter, so  that  it  is  not  necessary  to  take  the  x  and  y  units  of  the 
same  size.  Moreover  they  need  not,  in  a  case  of  this  kind,  neces- 
sarily vanish  at  the  origin,  but  it  is  convenient  to  draw  the  graph 
in  such  a  way  that  it  shall  occupy  the  greater  part  of  the  space  at 
our  disposal.  Thus,  we  have  roughly  80  small  squares  across  the 
breadth  of  our  graph  paper,  and  between  1871  and  1912  we  have 
roughly  40  years  ;  we  therefore  take  two  sides  of  a  square  to  1  year 
and  mark  off  the  years  1870,  1875,  1880,  .  .  .,  along  an  axis  or 
base  Une  parallel  to  the  breadth  of  the  paper,  as  shown  in  fig  (8). 
Again  we  have  roughly  70  small  squares  in  the  available  space 
from  this  base  line  to  the  top  of  our  graph  paper,  and  the  whole- 
sale price  index  numbers  vary  from  88-2  to  151-9,  a  range  of  63*7  ; 
we  therefore  take  one  side  of  a  square  to  correspond  to  a  difference 
of  1  in  the  price  index  number,  and  mark  off  the  prices  90,  100, 
110,  ...  ,  along  an  axis  parallel  to  the  length  of  the  paper,  as 
shown  in  the  figure. 

We  then  plot  points  to  represent  the  numbers  in  col.  (2)  of 
Table  (20).  Thus,  in  1880  wholesale  prices  stood  at  129  ;  we  there- 
fore travel  along  the  width  of  the  paper  till  we  reach  1880  and 
then  upwards  until  we  are  opposite  the  129  level  on  the  axis  of 
prices,  inserting  a  dot  to  mark  the  position.  Similarly  for  all  other 
points,  and  the  required  graph  is  given  by  joining  them  up  in 
succession. 
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Table  (20).  Mabriage  Rate  and  Wholesale  Prices 
Index  Numbers. 


(1) 


(2) 


(3) 


(4) 


(6) 


(6) 


(7) 


Nine  Years' 

Difference  be- 

Marriage 

Nine  Years' 

Difference  be- 

Year. 

Prices. 

Average 

tween  Nos.  in 

Average  of 

tween  Nos.  in 

of  Prices,     j 

Cols.  (2)  &  (3). 

rate. 

Marriage  rate. 

Cols.  (5)  &  (6). 

1871 

135-6 

167 

1872 

145-2 

174 

1873 

151-9 

. . 

176 

1874 

146-9 

. . 

170 

. . 

1875 

140-4 

139-3 

+  11 

167 

164 

+  3 

1876 

137-1 

138-6 

-1-5 

165 

162         i 

+  3 

1877 

140-4 

136-5 

+3-9 

157 

159        1 

_  2 

1878 

1311 

133-8 

-2-7 

152 

157         1 

-  5 

1879 

125-0 

131-5 

-6-5 

144 

155        ! 

-11 

1880 

129-0 

128-5 

+0-5 

149 

153 

-  4 

1881 

126-6 

125-2 

+  1-4 

151 

151 

.. 

1882 

127-7 

120-8 

+  6-9 

155 

149 

+  6 

1883 

125-9 

117-2 

+8-7 

155 

148 

+  7 

1884 

114-1 

114-7 

-0-6 

151 

148 

+  3 

1885 

107-0 

111-8 

-4-8 

145 

149 

-  4 

1886 

1010 

109-2 

-8-2 

142 

149 

-  7 

1887 

98-8 

106-9 

-8-1 

144 

149 

-  5 

1888 

101-8 

104-2 

-2-4 

144 

149 

-  5 

1889 

103-4 

102-5 

+0-9 

150 

149 

+  1 

1890 

103-3 

101-0 

+2-3 

155 

149 

+  6 

1891 

106-9 

99-9 

+  7-0 

156 

150 

+  6 

1892 

1011 

98-7 

+2-4 

154 

151 

+  3 

1893 

99-4 

97-4 

+2-0 

147 

153 

-  6 

1894 

93-5 

96-3 

-2-8 

150 

155 

-  5 

1895 

90-7 

950 

-4-3 

150 

156 

-  6 

1896 

88-2 

94-3 

-6-1 

157 

156 

+  1 

1897 

90-1 

93-8 

-3-7 

160 

157 

+  3 

1898 

93-2 

93-4 

-0-2 

162 

158 

+  4 

1899 

92-2 

93-8 

-1-6 

165 

159 

+  6 

1900 

100-0 

94-7 

+  5-3 

160 

159 

+   1 

1901 

96-7 

95-7 

+  1-0 

159 

159 

1902 

96-4 

96-9 

-0-5 

159 

158 

+   1 

1903 

96-9 

98-3 

-1-4 

157 

158 

-  1 

1904 

98-2 

99-5 

-1-3 

153 

156 

-  3 

1905 

97-6 

1000 

-2-4 

153 

155 

-  2 

1906 

100-8 

101-3 

-0-5 

157 

154 

+  3 

1907 

1060 

102-8 

+  3-2 

159 

153 

+  6 

1908 

103-0 

104-8 

-1-8 

151 

153 

-  2 

1909 

104-1 

. . 

147 

1910 

108-8 

150 

•  • 

1911 

109-4 

.. 

. , 

152 

' 

1912 

114-9 

•• 

•• 

155 

•• 
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It  is  comparatively  easy  from  this  graph  to  trace  the  change 
in  prices  from  year  to  year  and  from  decade  to  decade  :  for  example, 
we  note  that  from  1873  to  1896  the  tendency  of  prices  was  on  the 
whole  downward,  and  from  1896  to  1910  the  tendency  was  upward. 
Also  on  the  assumption — not  necessarily  valid — that  prices  have 
varied  continuously,  or  at  least  consistently,  during  the  intervals 
between  the  dates  to  which  the  records  refer,  it  is  possible  to  read 
off  intermediate  values  from  the  graph  :  e.g.  midway  between  1883 
and  1884  we  get  the  figure  120  as  the  index  number  for  prices. 

On  the  same  graph  sheet  we  have  also  plotted  the  marriage  rate 
from  year  to  year  during  the  same  period.  The  numbers  are  given 
in  col.  (5)  of  Table  (20).  This  rate  varies  from  142  to  176,  a  range 
of  34,  and  we  have  a  range  of  40  small  squares  at  our  disposal  in 
plotting  ;  a  difference  of  1  in  the  marriage  rate  has  therefore  been 
taken  to  correspond  to  one  side  of  a  square,  and  the  marriage  rates 
140,  150,  160  .  .  .  are  accordingly  marked  along  the  axis  perpen- 
dicular to  the  same  base  line  as  before,  which  is  used  again  to 
measure  the  passage  of  years,  but  the  second  graph  is  drawn  below 
the  line  whereas  the  first  was  drawn  above  it.  In  this  way  we 
are  able  to  compare  the  two  graphs,  namely,  the  one  registering 
the  change  in  prices  and  the  one  registering  the  change  in  marriage 
rate  from  year  to  year. 

It  is  interesting  to  observe  that  the  two  seem  to  be  not  uncon- 
nected :  they  go  up  and  down  almost  in  the  same  time,  and  moun- 
tains and  valleys  in  the  one  correspond  roughly  to  mountains  and 
valleys  in  the  other  ;  in  other  words,  there  is  some  kind  of  correlation 
or  reciprocal  relation  between  them.  Now  these  mountains  and 
valleys  are  largely  the  result  of  what  may  be  called  short-time 
fluctuations^  and  it  is  important  to  distinguish  between  these  changes 
which  are  transient  and  the  more  permanent  or  long-time  changes. 
In  order  to  get  rid  of  the  former,  which  sometimes  conceal  the 
latter,  the  following  device  has  been  adopted  :  noticing  that  the 
wave  period,  the  length  of  time  taken  for  each  complete  up-and- 
down  motion,  is  one  of  about  nine  years,  nine-yearly  averages  have 
been  taken  of  the  figures  for  wholesale  prices  right  down  col.  (2) 
of  Table  (20) ;   thus  139-3  is  the  average  of  the  index  numbers  from 

1871  to  1879  inclusive,  138-6  is  the  average  of  the  numbers  from 

1872  to  1880  inclusive,  and  so  on,  the  results  being  recorded  in 
col.  (3).  When  the  points  corresponding  to  these  numbers  are 
plotted  we  get  the  broken  line  in  fig.  (8)  passing  through  the  body 
of  the  original  graph  of  prices  and  indicating  its  general  trend  in 
the  course  of  years  as  separated  from  the  temporary  fluctuations. 
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1870   1875    1880    1885    1890    1895    1900   1905   1910 

Fig.  (8).  Graph  showing  Variation  in  Wholesale  Prices  Index  Numbers. 


1870  1875  1880  1885  1890  1895  1900  1905  1910 

Fio.  (9).  Graph  showing  Variation  in  Marriage  Rate  Index  Numbers. 
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The  same  procedure  has  been  followed  with  the  marriage  rate 
statistics ;  the  nine-yearly  averages  are  shown  in  col.  (6)  of  Table  (20), 
and  their  graph  appears  as  a  broken  Une  passing  through  the  body 
of  the  original  marriage  rate  graph  in  fig.  (9). 
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Fig.  (10). 

Suppose  we  wish  on  the  other  hand  to  study  the  short-time 
fluctuations  as  distinct  from  the  long-time  changes,  we  may  do  so 
by  forming  the  differences  between  the  numbers  for  each  year 
and  the  corresponding  nine-yearly  averages,  and  plotting  these 
differences  on  convenient  scales. 

The  numbers  obtained  in  this  way  are  recorded,  with  their  proper 
signs — positive  if  above  the  average,  negative  if  below — in  cols.  (4) 
and  (7)  of  Table  (20),  and  the  graphs  of  these  differences  are  drawn, 
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one  below  the  other  for  comparison,  on  the  same  graph  sheet 
(fig.  10).  The  agreement  in  fluctuation  from  the  average  between 
the  two  factors,  marriage  rate  and  prices,  is  more  easily  remarked 
now  than  it  was  in  the  original  graphs.  High  prices  go  as  a  rule 
hand-in-hand  with  prosperous  times,  and  such  times  lead  to  more 
frequent  marriages.  This  statement  must  not  be  taken  to  imply 
that  when  prices  are  high  the  times  are  always  necessarily  pros- 
perous for  the  community  as  a  whole  :  the  lie  direct  would  be  given 
to  such  an  implication  by  any  one  who  had  experienced  abnormal 
war  conditions. 

After  about  1892,  while  the  fluctuations  continue  to  be  similar, 
a  tendency  appears  for  the  marriage  rate  graph  to  reach  each 
extreme  point  about  a  year  in  advance  of  the  other,  as  though  an 
increase  in  marriages  raised  prices  and  a  decrease  lowered  them. 
There  is  no  doubt  that  any  economic  change,  especially  if  it  takes 
place  on  a  large  scale,  will  set  up  a  system  of  corresponding  forces, 
sometimes  in  unexpected  directions,  actions  and  reactions  succeed- 
ing one  another  at  intervals  like  tidal  waves  producing  each  a  back- 
wash as  it  breaks,  but  such  effects,  even  when  anticipated  in  theory, 
are  not  always  easy  to  unravel  in  practice. 

The  comparison  we  have  been  discussing  between  changes  in 
prices  and  marriages  is  suggested  in  Sir  W.  H.  Beveridge's  Unemploy- 
ment. The  whole  book  will  repay  careful  study,  but  it  contains 
one  particularly  illuminating  chapter  on  '  CycUcal  Fluctuation  '  with 
a  chart  labelled  '  The  Pulse  of  the  Nation,'  because  of  the  remark- 
able picture  it  gives  of  the  ebb  and  flow  of  the  tide  of  national 
prosperity.  It  consists  of  a  series  of  curves  representing  respec- 
tively :-^ 

(1)  bank  rate  of  discount  per  cent.  ; 

(2)  foreign  trade  as  measured  by  imports  and  exports  per  head 

of  the  population  ; 

(3)  percentage  of  trade  union  members  not  returned  as  unem- 

ployed ; 

(4)  number  of  marriages  per  1000  of  the  population  ; 

(5)  number  of  indoor  paupers  per  1000  of  the  population  ; 

(6)  gallons  of  beer  consumed  per  head  of  the  population  ; 

(7)  nominal  capital  of  new  companies  registered  in  pounds  per 

head  of  the  population. 

The  interesting  thing  about  these  curves  is  to  see  the  way  m 
which  they  move  in  waves  of  varying  size  up  and  down  almost 
together,  showing  a  connection  between  such  phenomena   moro 
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intimate  than  one  might  at  first  have  suspected.  A  note  of  caution 
must  be  inserted  here  however  :  cavsal  connection  must  not  be  too 
confidently  inferred  in  discussing  the  correlation  of  characters 
changing  simultaneously  with  time  ;  because  two  events  happen 
together,  one  is  nof  necessarily  caused  by  the  other. 

An  instructive  article  bearing  on  this  point  appeared  recently  in  a 
periodical  well  known  to  students  of  social  problems.  It  was  there 
stated  that  high  positive  correlation  exists  between  birth  rate  and 
infantile  death  rate  :  in  general  the  two  rise  or  fall  together,  whence 
Neo-Malthusians  argue  that  the  way  to  lower  a  death  rate  is  to 
lower  the  birth  rate.  The  writer  then  contrasts  Bradford,  the  last 
word  in  the  scientific  care  of  infants,  with  Roscommon,  where  con- 
ditions as  to  wealth  and  child  welfare  are  the  very  reverse,  and 
points  out  that  Bradford  has  a  birth  rate  of  13  and  an  infant  death 
rate  of  135,  while  Roscommon  has  a  birth  rate  of  45  and  an  infant 
death  rate  of  35.  These  figures,  he  suggests,  prove  instantaneously 
that  the  Neo-Malthusians  are  guilty  of  the  commonest  of  all  fallacies, 
they  confound  correlation  with  causation. 

As  an  exercise  in  plotting  the  reader  may  see  whether  he  can 
discover  any  suggestion  of  correlation  between  crime  and  unem- 
ployment by  comparing  the  following  statistics,  showing  the  number 
of  indictable  offences  tried  in  the  United  Kingdom  and  the  trade 
union  unemj)loyed  percentages  respectively  from  1861  to  1905  : — 

Table  (21).  Number  of  tried  Indictable  Offences  and 
Trade  Union  Unemployed  Percentages  (1861-1905). 


No.  of  Indictable 

Trade  Union 

No.  of  Indictable 

Trade  Union 

Year. 

Offences  tried 

Unemployed 

'    Year. 

Offences  tried 

Unemployed 

(in  thousands). 

percentages. 

1 

(in  thousands). 

percentages. 

1861 

560 

3-7 

1874 

53-5 

1-7 

1862 

61-3 

60 

1875 

500 

2-4 

1863 

ei-4' 

4-7 

1864 

58-4 

1-9 

1876 

51-9 

3-7 

1865 

69-9 

1-8 

1877 

53-8 

4-7 

1866 
1867 
1868 

57-6 
59-5 
62-4 

2-6 
6-3 
6-7 

1878 

1    1879 

1880 

560 
550 
60-7 

6-8 
11-4 

5-5 

1 

1869 

61-3 

5-9 

1881 

60-6 

35 

1870 

561 

3-7 

1    1882 

63-3 

2-3 

1871 

531 

1-6 

1883 

60-8 

2-6 

1872 

51-9 

0-9 

1884 

59-6 

81 

1873 

53-5 

1-2 

1885 

56-4 

9-3 
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Table  (21).  Number  of  tried  Indictable  Offences  and  Trade 
Union  Unemployed  Percentages  (1861-1905) — Continued. 


No.  of  Indictable 

Trade  Union 

j 

No.  of  Indictable 

Trade  Union 

Year. 

Offences  tried 

Unemployed 

1    Year. 

Offences  tried 

Unemployed 

(in  thousands). 

percentages. 

1 
1 

(in  thousands). 

percentages. 

1886 

56-2 

10-2 

^    1896 

50-7 

3-3 

1887 

56-2 

7-6 

■    1897 

50-7 

3-3 

1888 

58-5 

4-9 

1898 

52-5 

2-8 

1889 

57-6 

21 

1    1899 

50-5 

20 

1890 

550 

21 

1900 

53-6 

2-5 

1891 

541 

3-5 

1901 

55-5 

3-3 

1892 

58-3 

6-3 

1902 

571 

40 

1893 

57-4 

7-5 

1903 

58-4 

4-7 

1894 

56-3 

6-9 

1904 

600 

60 

1895 

50-8  / 

5.8 

j    1905 

61-5 

50 

The  chief  point  of  difficulty  in  plotting  such  graphs  is  the  initial 
one  of  fixing  upon  the  most  convenient  scales  to  use,  and  in  this 
matter  hints  only  can  be  given,  facility  will  come  by  practice.  An 
examination  of  Table  (21)  shows  that  the  data  cover  a  period  of  forty- 
five  years  which  can  be  marked  off  horizontally  along  a  base  line  so 
as  just  to  fit  comfortably  into  the  available  space  across  the  graph 
paper.  The  unemployed  percentages  vary  between  0-9  and  11-4, 
giving  a  range  of  10-5.  Similarly  the  indictable  offences  recorded 
(in  thousands)  present  a  range  of  13-3.  We  might  therefore  very 
well  choose  the  same. vertical  scale  for  the  measurement  of  indict- 
able offences  and  unemployment,  but,  in  order  that  the  graphs 
may  run  more  or  less  together  (without  exactly  overlapping)  for 
the  sake  of  comparison,  only  the  unemployment  zero  need  be  taken 
actually  on  the  base  line,  whereas  the  indictable  offences  may  have, 
say,  the  number  50  (thousand)  at  that  level ;  also  it  will  be  con- 
venient to  show  the  scale  for  unemployment  on  the  right  side 
and  the  scale  for  offences  on  the  left  side  of  the  paper. 

An  example  deaHng  with  matters  somewhat  different  is  provided 
by  a  comparison  of  changes  from  week  to  week  in — 

(1)  the  mean  air  temperature  ; 

(2)  the  percentage  of  possible  sunshine  ;    and 

(3)  the  rainfall. 

The  following  is  a  record  of  observations  taken  at  Greenwich  in 
1912  [data  from  London  Statistics,  vol.  xxiii.] : — 
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Table  (22).  Weekly  Meteorological  Observations 
AT  Greenwich  (1912). 


Week 
ended— 

Jan.   6 

Mean  Air 
Tempera- 
ture- 
Degrees 
Fahren- 
heit. 

Per- 
centage of 
possible 
Sunshine. 

Rainfall 

in 
inches. 

1 

Week 
ended— 

Mean  Air 
Tempera- 
ture- 
Degrees 
Fahren- 
heit. 

Per- 
centage of 
possible 
Sunshine. 

Rainfall 

in 
inches. 

45-7 

7 

0-76 

July     6 

58-7 

15 

0-36 

13 

41-9 

15 

0-45 

13 

670 

46 

0-20 

20 

40-2 

1 

0-93 

20 

65-8 

44 

004 

27 

38-9 

8 

0-88 

27 

64-8 

31 

016 

Feb.  3 

300 

21 

002 

Aug.    3 

57-8 

33 

0-54 

10 

39-5 

15 

0-52  ■ 

10 

57-6 

28 

1-26 

17 

45-5 

11 

0-44 

17 

56-2 

14 

0-23 

24 

47-4  . 

6 

0-65 

24 

57-2 

24 

1-27 

Mar.  2 

49-8 

21 

0-52 

31 

56-9 

27 

1-33 

9 

44-6 

31 

0-79 

Sept.    7 

54-8 

36 

0-21 

16 

451 

16 

019 

14 

52-4 

14 

.002 

23 

42-7 

15 

108 

21 

53-6 

22 

000 

30 

510 

46 

005 

28 

51-5 

59 

002 

Apr.  6 

48-0 

43 

007 

Oct.     5 

48-8 

36 

2-30 

13 

45-6 

43 

002 

12 

460 

53 

000 

20 

500 

50 

0-00 

19 

49-8 

38 

013 

27 

52-6 

76 

000 

26 

45-4 

23 

0-g8 

May  4 

501 

32 

0-21 

Nov.    2 

491 

31 

0-55 

11 

59-7 

29 

006 

9 

47-2 

6 

018 

18 

55-2 

49 

0-69 

16 

43-3 

3 

017 

25 

541 

38 

019 

23 

46-2 

6 

0-31 

June  1 

57-0 

47 

017 

30 

40-4 

13 

106 

8 

54-2 

35 

0-99 

Dec.     7 

42-4 

9 

0-31 

15 

581 

48 

0-39 

14 

490 

2 

0-62 

22 

61-7 

56 

0-65 

21 

44-4 

19 

0-59 

29 

60-2 

45 

0-30 

28 

48-1 

8 

1-22 

The  rainfall  graph  here  should  be  drawn  reversed  {i.e.  so  that 
it  goes  up  as  the  rainfall  goes  down  in  amount,  and  vice  versa), 
because  one  would  expect  in  general  much  rain  to  go  with  little 
sun  and  low  temperature. 

The  range  of  temperature  during  the  year  is  37  degrees,  of  sun- 
shine 75  per  cent.,  and  of  rainfall  2-30  in.  Hence  the  vertical 
scales  for  these  three  graphs  might  be  chosen  so  that,  roughly, 
40  units  of  temperature  should  correspond  to  80  units  of  sunshine 
arid  2  units  of  rainfall.  Also  the  zeros  of  the  three  variables  should 
be  so  placed,,  relative  to  the  horizontal  base  line  registering  'the 
weeks,  that  the  three  graphs  may  be  conveniently  compared  without 
causing  confusion  by  too  closely  overlapping. 


CHAPTER    IX 

GRAPHS  {continued) 

Graphical  Ideas  as  a  Basis  for  Interpolation.  It  frequently  happens 
in  statistical  records  that  awkward  gaps  occur  which  require  to  be 
filled  in ;  this  may  be  due  to  the  fact  that  no  record  has  been 
made,  or  that  it  has  been  made  with  insufficient  detail,  or  that  it 
has  been  lost  or  destroyed.  Cases  in  point  arise  in  connection  with 
returns  Uke  that  of  the  Census  which  can  only  be  undertaken  every 
few  years,  so  that  if  figures  are  wanted  for  any  intervening  year, 
as  they  are  in  very  many  instances,  an  estimate  has  to  be  made 
from  the  known  results  of  the  years  recorded.  It  is  imperative,  for 
example,  for  many  purposes  of  local  or  national  government,  to 
be  able  to  find  with  a  fair  degree  of  accuracy  the  population  of 
county  boroughs  and  urban  or  rural  districts  at  any  given  time, 
to  know  the  number  of  workers  engaged  in  different  occupations, 
the  amount  of  land  in  pasture  and  under  various  crops,  the  con- 
dition of  the  people  as  to  housing,  of  the  children  as  to  education, 
and  so  on  indefinitely. 

Symbolically,  with  the  same  notation  as  we  have  used  before, 
we  conceive  the  statistics  in  tabular  form,  like 

•^l>  '^2^  '^a       *  •  •       "^n 

VV  2/2»  2/3       •  '  '       Vn 

each  y  denoting  the  frequency  corresponding  to  the  character 
measured  by  its  companion  x,  e.g.  the  ic's  may  stand  for  successive 
dates  and  the  2/'s  for  the  frequencies  of  the  population  of  a  certain 
district  at  those  dates.  If  it  happens  that  one  or  more  of  the  y's,  in 
between  the  first  and  the  last  recorded,  are  missing,  the  problem  is 
to  estimate  the  missing  values  by  some  method  of  interpolation,  as 
it  is  called.  Various  methods  of  arriving  at  such  estimates  are  used, 
but  we  shall  only  refer  to  the  more  elementary  here. 

A"  rough  way  of  making  the  estimate,  but  one  which  is  often  as 
accurate  as  the  data  will  allow,  is  to  plot  the  observations,  each 
{x,  y)  being  represented  by  a  point,  and  connect  them  up,  if  there 

85 


86 


STATISTICS 


be  enough  of  them,  by  a  smooth  curve  drawn  freehand  P^  Pg  P3  .  .  .  P„ 
[see  fig.  (11)] ;  to  find  the  y  proper  to  any  other  x  we  have  then 
only  to  draw  the  ordinate  through  the  point  {x,  0)  and  measure  the 
y  at  the  point  where  it  cuts  the  curve.     This  is  a  not  unreasonable 

principle  to  follow,  for  in  effect  it 
gives  due  weight  to  each  of  the 
observations  actually  recorded, 
and  it  assumes  an  even  course 
from  each  one  to  the  next — a 
justifiable  assumption  in  the 
absence  of  any  evidence  that 
some  sudden  discontinuity  of 
value  has  taken  place. 

If  only  two  observations  are 
given,  represented  by  the  points 
Pi  (a^i,  2/1)  and  Pg  {x^,  y^),  the 
curve  connecting  them  is  a  straight  line,  and  the  y  corresponding  to 
any  other  x  is  at  once  given  geometrically,  as  fig.  (12)  shows,  by 

PM      PiM 


P2M2 


PiM, 


%,e. 


or 


y-yi_^-^i 

2/2     Vi     ^2    ^1 


y=yi-\- 


{x-x^), 


the  familiar  proportional  relation  which  is  employed  in  this  simple 
case. 

P. 


Example. — Given 
Required  log  5-826736. 


Fio.  (12). 

log  5-82673=0-7654249, 
log  5-82674=0-7654257. 
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Here  a;i=5-826730  2/i=0-7654249 

:r2=5-826740  2/2=0-7654257 

a:=5-826736. 

Therefore,  by  means  of  the  above  relation, 

,=0.7654249+^:222222_^o.000006) 
^  0-000010  ' 

=0-7654249+0-00000048 

=0-7654254. 

The  logarithmic  curye  2/=log  x  is,  of  course,  not  a  straight  line, 
and  the  value  obtained  for  y  only  represents  a  first  approximation 
to  the  true  value. 

When  more  than  two  points  are  given  there  is  bound  to  be  a 
margin  of  inaccuracy,  more  or  less  according  to  the  data,  intro- 
duced in  drawing  the  curve.  For  an  example  of  this  method  the 
reader  may  refer  back  to  the  curve  on  p.  67,  which  was  used  to 
determine  the  median  and  quartiles.  We  may,  as  we  saw,  read 
off  from  it  the  number  of  candidates  who  obtained  not  more  than 
any  stated  number  of  marks  :  e.g.  300  candidates  obtained  not 
more  than  34  marks  ;  or  we  may  use  it  the  other  way  round  and 
find  the  number  of  marks  obtained  by  a  stated  number  of  candi- 
dates :  e.g.  10  per  cent,  of  the  candidates  got  less  than  17  marks. 
Such  examples  might  be  multiplied  endlessly,  and  the  method  will 
be  foxmd  extremely  useful  when  a  high  degree  of  accuracy  is  not 
looked  for.  But  greater  confidence  will  be  felt  perhaps  in  such 
results — though  the  foundation  for  it  may  be  no  more  secure  in  many 
cases — if  we  can  translate  them  from  geometrical  to  algebraical 
form,  if  we  can  find,  that  is  to  say,  some  formula,  like  the  simple 
proportional  relation  already  introduced  above,  which  will  give 
one  y  when  others  are  known. 

In  order  to  make  the  argument  as  general  as  possible  we  shall 
speak  of  x  and  y  as  variables,  and  we  shall  think  of  the  value  of  y 
as  depending  upon  that  of  x  in  such  a  way  that  when  x  is  given, 
y  is  known  or  it  can  be  estimated  *  (in  the  sense  that  when  the 
year  is  given  the  population  is  known  or  can  be  estimated). 

Suppose 

y=CQ-\-c^x-\-c^x^-^r     . 

[*  This  is  equivalent  to  assuming  that  y  is  some  function  of  x,  say  y=/{x),  and 
clearly  some  such  assumption  is  necessary  if  any  estimate  from  the  known  values 
to  the  unknown  is  to  be  possible.  Further,  for  simplicity  we  assume  f(x)  can 
be  expanded  in  a  Maclaurin's  converging  series  of  ascending  powers  of  ar,  which 
simply  means  that  we  take  the  relation  between  x  and  y  to  be  of  the  form 
adopted  above.  ] 
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where  the  c's  are  constants  to  be  determined,  and  their  number 
can  be  made  to  depend  upon  the  number  of  known  values  of  y 
which  are  used  in  the  estimate. 
Geometrically,  the  equation 

represents  a  curve  called  a  parabola  of  the  nth  order,  and  such 
a  curve  could  be  employed  (and  uniquely  found — there  is  only  one 
parabola  of  the  kind  which  will  go  through  all  the  points)  if  we 
based  our  estimate  upon  a  knowledge  of  (^+1)  2/'s  corresponding 
to  given  a:'s,  for  we  could  readily  make  it  pass  through  the  (n-\-\) 
known  points  (Xq,  y^),  [x^,  y^),  (x^,  y^),  ...  {Xn,  yj  by  choosing 
the  (n-\-l)  c's  so  as  to  satisfy  the  (n-\-l)  simple  linear  relations  : — 

2/0^^0   I   ^1*^0"!" ^2*^0  "T"        •  •  •        'f'^nXo 

■yi=Co-]-CiX^+c.^Xi^-\-      .  .  .      +c„a;i" 

When  the  curve  is  determined,  in  other  words  when  the  c's  are 
known,  we  can  find  any  other  y  required  by  substituting  the  corre- 
sponding X  in  the  equation 

y=Co-\-CiX-{-C2X^-\-     .  .  .     +c„x", 

i.e.  by  supposing  this  point  [x,  y)  to  lie  on  the  same  curve  that  goes 
through  the  known  points. 

It  is  well  to  mention  here  that  the  parabola  is  by  no  means  always 
the  best  curve  for  fitting  any  given  statistics,  and  when  the  number 
of  observations  is  adequate  it  is  possible  often  to  make  a  more 
satisfactory  choice.  Once  the  equation  of  a  suitable  curve  has 
been  determined  the  subsequent  interpolation  or  calculation  of  y 
for  any  given  x  is  not  as  a  rule  a  very  difficult  matter.  The  larger 
question  of  curve  fitting  in  general  is  reserved  for  a  later  chapter. 

Example  of  First  Method  (fitting  with  a  parabolic  curve).  Let  us 
illustrate  this  process  of  interpolation  by  fitting  a  parabolic  curve 
to  the  following  figures,  extracted  from  Porter's  The  Progress  of 
the  Nation,  giving  the  annual  cost  of  Poor  Relief  (excluding  insane 
and  casual)  at  five-yearly  intervals,  but  with  the  amount  for  the 
year  1845  omitted  : — 

Year     .         .         .     1835,    1840,    1845,    1850,    1855] 
Cost  in  £1000     ,         .         ,     5526,    4577,       ?       5395,    5890J 
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Assuming  that  no  extraordinary  conditions  prevailed  in  1845  to 
cause  abnormality  in  expenditure,  let  us  estimate  what  the  figure 
would  be  for  that  year  judging  from  the  given  records  just  before 
and  after.  Since  there  are  four  known  points  in  this  case,  we  take 
as  the  curve  through  them  a  parabola  of  the  3rd  order,  namely  : — 

y=Co-{-CiX-{-CiX^+C32^ ;    .  .         .     (1) 

the  four  known  points  will  then  just  suffice  to  determine  uniquely 
the  four  arbitrary  constants  Cq,  c^,  Cg,  Cg.  Also,  since  the  x  class- 
intervals  are  equal,  it  will  simplify  the  algebra  if  we  measure  from 
the  year  1845  as  origin,  taking  five  years  as  unit  for  x  and  £1000 
as  unit  for  y,  so  that  we  get 

x--=-2,       -1,       0,      +1,       +2  \ 
y-5526,     4577,     y^,     5395,     5890J 

where  yQ  is  the  number  to  be  determined. 

Since  all  five  points  are  to  lie  on  the  curve  with  equation  as  in 
(1),  we  have  by  substituting  in  that  equation — 

5526=Co— 2ci+4c2— 8C3 
4577=Co— Ci4-C2— C3 

2/o=Co 
5395=Co+Ci+C2+C3. 
5890=Co+2ci+4c2+8c3. 

Adding  the  first  and  last  of  these  equations, 

2co+8c2  =5526+5890    .         .         .         .        (2) 

Adding  the  second  and  last  but  one, 

2co+2c2=4577+5395 
or  8co+8c2=4(4577+5395)         .  .  .        (3) 

Subtracting  (2)  from  (3), 

6co=4(4577+5395)- (5526+5890)    .         .        (4) 
=4(9972)- (11416) 
=  39888-11416 
=28472. 
Therefore         yo = Co = £4,745 ,000 . 

If  we  only  wish  to  make  use  of  the  records  for  the  years  1840 
and  1850,  the  appropriate  fitting  curve  reduces  to  a  straight  line 

y=c^+c^x, 
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on  which  we  assume  the  points 

(-1,4577),     (0,2/o),     (+1,5395) 
to  lie,  so  that 

4577=Co— Ci 

5395=-Co+Ci. 
Therefore,  adding  the  first  and  last  of  these  equations, 

'  2co=4577+5395, 

so  that  yo=Co=£4,986,000. 

*  Second  Method  {using  a  formula  connecting  the  ordinates).  When, 
as  above,  the  steps  from  each  x  to  the  next  are  equal,  as  commonly 
happens  in  practice,  it  is  possible  to  write  down  a  simple  relation 
between  the  y's,  known  and  unknown,  without  introducing  the  c's 
at  all.  At  bottom  the  method  is  the  same  as  the  last,  inasmuch  as 
the  elimination  of  the  c  constants  by  the  first  method  really  results 
in  the  same  formula  for  the  unknown  y. 

Let  us  represent  the  given  statistics  in  this  case  by 

Xq,  XQ-{-h,  XQ-{-2h      .  .  .     XQ-{-nh\ 

2/0.       2/l,  2/2  •  •  •  2/n      J 

so  that,  if  the  fitting  curve  be 

y=CQ+c^x+C2X^-\-     ,         .  .     +c„a;^, 

we  have,  by  substituting  the  co-ordinates  of  the  first  two  points 
in  this  equation, 

yi=Co+c^{Xo-\-h)+c.^{Xo-{-h)^-{-      .  .  .      -\-c^{xQ-i-h)'' 

and  yo=CQ-f-Ci     Xq     -\-C2    Xq       -\~     .         .  .     -|-c„:^q  . 

Hence 

yi-yo=Cih+C2{2xoh-\-h^)-\-    .     .     .    +c^{nxQ^-^h-\-    .     .     .). 

Now  this  result,  which  we  call  the  1st  difference  between  the  y's, 
is  of  (ri— l)th  degree  in  Xq,  so  that  by  subtracting  two  of  the  y's 
we  have  reduced  the  degree  in  a^o  by  1.     Similarly, 

y2-yi=cJi+C2{2xoh+3h^)+   ....   +c^(nxo^-^h+   .    .    .)• 

Thus  we  get  a  series  of  1st  differences,  each  with  the  highest 
term  of  the  {n—l)th  degree  in  Xq.     Treating  them  as  a  series  of  new 

[*  The  non-mathematical  ceader  will  do  well  to  omit  the  rest  of  this  section  on 
interpolation.] 
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ordinates  and  forming  their  differences  in  the  same  way,  we  get 
what  may  be  called  the  2nd  differences  between  the  y^s,  a  series 
of  ordinates  each  with  the  highest  term  of  degree  {n—2)  in  Xq. 
Proceeding  in  this  way  the  ^rd  differences  between  the  y's  are  a 
series  of  ordinates  of  degree  (n— 3)  in  Xq,  the  Uh  differences  q^xq  of 
degree  (^—4),  and  so  on,  mi  til  ultimately  we  reach  the  nth  differences, 
which  are  of  zero  degree  in  Xq,  and  consequently  involve  only  h. 
It  follows  that  the  nth  differences  must  all  be  equal  in  value  and 
therefore,  if  we  go  one  step  further  and  write  down  the  (n-\-\)ih 
differences,  these  must  vanish  altogether. 

If  the  reader  finds  any  difficulty  in  following  the  argument 
he  should  test  it  step  by  step  for  himself  in  the  simple  case  of  a 
parabola  of  the  third  order  when  it  should  be  perfectly  clear. 

The  formation  of  the  successive  differences  is  conveniently  shown 
in  Table  (23). 


Table  (23). 

Successive  Differences  of 

Ordinates. 

First 

Second 

Third 

Fourth 

Fifth 

y 

difference 

difference 

difference 

difference 

difference 

A 

A2. 

A3. 

A4. 

ab. 

^l-3/o) 
J/2  - 1/1  ) 

¥2-22/1+2/0^ 
2/3 -22/2+2/1 J 

2/3-32/2+33/1-2/0^ 
2/4-32/3+32/2-2/1. 

Vf. 

2/4- 42/3+61/2 -■*  2/1+2/0) 
2/5-42/4+61/3-42/2+2/1' 

3/3-2/2 

2/5  -  52/4+103/3  -  102/2+52/1  -  J/o 

3/» 

2/4-22/3+2/2 

1/4 -1/S 

2/5-32/4+32/3-2/2 

V4 

y5-2/4 

¥5-22/4+2/3 

2/5 

The  law  of  formation  should  be  apparent  from  this  table,  for  it 
is  precisely  that  which  we  meet  in  the  binomial  expansion,  e.g.  the 
Tith  difference  is  of  type 


,  n{n—\)  n{n—\)in—2) 


+  (- 1^2/0, 


and  by  equating  to  zero  the  {n-{-\)th.  difference  we  have  the  relation 
required  between  the  i/'s. 

Example. — Let  us  apply  this  method  to  the  '  Poor  Relief  '  example 
already  considered.  Since  there  are  four  knowTi  points  the  relation 
between  x  and  y  must  be  of  the  form 

y=CQ+Cj,X-\-C2X^-\-C3X^ 

as  before.     Hence  the  4th  differences  must  vanish,  and  taking  the 
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.  points  in  order  from  years  1835  to  1855  as  {Xq,  ^/o).  (^i>  ^i)*  (^2»  Vi)' 
(a?a,  ya).  (^4.  2/4).  we  get 

2/4-%3+62/2-42/i+2/o=0 

as  the  formula  connecting  five  y's,  four  known  and  one  (1/2)  unknown. 

Therefore  %2=4(2/i+2/3)— (1/0+2/4) 

=4(4577+5395)- (5526+5890), 

which  is  equivalent  to  equation  (4)  on  p.  89. 

Thus  y,=£4,745,000. 

Third  Method  {by  means  of  advancing  differences).  In  the  last 
method  we  employed  a  relation  connecting  ?/„  with  all  the  preceding 
y's,  but  it  is  possible  also  to  express  y^  in  terms  of  2/0  a-nd  the  suc- 
cessive differences,  which  may  be  written  /\,  /\^,  /\^,  .  .  .  A** » 
we  have,  in  fact,  with  the  notation  of  Table  (23)  : — 

Ao=2/i-2/o.  Ao^=2/2-22/i+2/o»  Ao^=2/3-32/2+32/i-2/o,  •  •  • 

Thus 

2/1=2/0+ Ao- 

2/2=22/1-2/0+ Ao'=2/o+2Ao+Ao'. 

2/3=%2-32/i+2/o+ Ao^ 

=3(2/0+2  Ao+  Ao')-3(2/o+  Ao)+2/o+  Ao' 
=2/o+3Ao+3Ao'+Ao'. 
^4=42/3— 61/2+42/1— 2/0+  Ao* 

=4(2/0+3  Ao+3Ao'+  Ao')  -  6(2/0+2  Ao+  Ao')+4(2/o+  Ao)  - 

2/0+ A  0* 
=2/o+4Ao+6Ao'+4Ao'+Ao*. 

Here  again  the  law  of  formation  is  clear,  and  it  is  readily  estab- 
lished by  induction  that,  for  all  positive  integral  values  of  n, 

,„=,„+„Ao+^W+"^^ff^W+ (5) 

a  series  which  automatically  comes  to  an  end  at  the  term  Ao"- 

An  extension  of  this  formula  is  obtained  by  writing  6  in  place 
of  w,  where  0<^<1.     We  then  get 

2/fl=2/o+c^Ao— -Y-2-Ao'^  +  — f"2~'3 — ^0        -  -    ,  -     -      -    (6) 

which  enables  us  to  interpolate  for  a  2/  in  between  any  two  of  a  series 
of  y's  corresponding  to  x's  advancing  by  equal  steps.     This  relation 


is  no  longer  identically  true  as  was  (5),  for  the  series  on  the  right 
in  (6)  is  unending,  but  its  application  in  practice  is  justified  when, 
as  the  differences  advance,  the  numbers  obtained  tend  to  grow 
smaller  and  smaller,  so  that  the  remainder  after  a  certain  number 
of  terms  can  be  treated  as  negligible.  Unless  this  tendency  is 
reaUzed  without  carrying  the  differences  far  the  formula  is  not 
very  satisfactory. 

To  illustrate  the  method  of  procedure  the  following  figures  may 
be  used  from  Table  (7),  p.  25:— 


Table  (24).  Ma.rks  obtained  by  certain  Candidates 
IN  AN  Examination 


No.  of 

First 

Second 

Third 

No.  of  Marks. 

Candidates. 

difference 

difference 

difference 

y 

A 

A2 

A3 

Not  more  than  45 

447 

37 

,,      „    50 

484 

21 

-16 

1 

»         „      »    55 

505 

6 

-15 

12 

„      „    60 

511 

3 

-  3 

,,      ,,    65 

514 

Suppose  now  we  wish  to  know  the  number  of  candidates  who 
obtained  a  number  of  marks  not  more  than  48.  In  that  case,  in 
applying  formula  (6),  we  have 

2/0=447,  ^=(48-45)/(50-45)=3/5, 

Ao=  37,     Ao'=-16,  Ao'=l, 

and  hence,  up  to  this  order  of  differences,  the  required  number  of 
candidates  is  given  by 


(1) 


447+5  .  37-:^^(-16)+i*lliA^ 
1.2  1.2.3 

=447+22-2+l-92+006 

=471,  approximately. 

Also,  number  of  candidates  obtaining  more  than  48  marks,  but  not 
more  than  50 

=484-471 

=  13,  approximately. 
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Fourth  Method  {by  means  of  Lagrange's  Formula).  We  shall 
consider  one  more  formula,  due  to  the  famous  French  mathematician 
Lagrange  (1736-1813),  which  is  useful  when  the  recorded  i/'s  corre- 
spond to  ic's  which  advance  by  unequal  stages. 

Let  the  given  statistics  be  represented  as  before  by 

(a^o.  2/o)>  (^1.  2/i).  (^2.  y-i)^   '  '   •   i^n,  2/n)» 
and  consider  the  equation 

{x-Xi){x-X2)    .  .  .    ix-x„) 


y=yo 

+2/ 


{Xq      Xj){Xq      X2)    .    .    .    {Xq      XjJ 
{x-Xo){x-X2)      .    .    .    {X-Xj   ^ 
(Xi      SJo/l-^l       •^2)    '    •    •    l*^!       **'«) 

{x-Xq){x-Xi)    .  .  .    (a;-a;„_i) 
[X„      Xq)(X„      Xj^)  .    .    .    [Xji      Xn-i) 

It  is  of  the  nth  degree  in  x,  and  it  is  identically  satisfied  by  the 
(n-{-l)  pairs  of  values 

{x=^Xo,  y=yo),  {x=xi,  y=y^),  .  .  .  {x=x^,  y=yn)- 

It  will  therefore  clearly  serve  as  the  fitting  curve 

y=Co+c^x+C2X^-{-  .  .  .  +c„a:«, 

being  exactly  of  this  type,  and  in  order  to  get  the  y  corresponding 
to  any  other  x  we  have  only  to  substitute  that  value  of  x  in  (7). 

Example. — The  following  figures,  based  upon  data  from  Porter's 
The  Progress  of  the  Nation,  show  the  age  distribution  of  criminals 
in  the  j^ear  1842. 

Percentage  of  criminals  up  to  age  25=52-0  (?/o). 

„    30=67-3  (2/1). 

„    40=84-1  (2/2). 

„    50=92-4  (2/3). 

Let  us  employ  Lagrange's  formula  to  find  the  approximate 
percentage  of  criminals  up  to  35  years  of  age,  making  use  of  the 
four  ordinates  given,  and  taking  a;=35.     We  have 

_     (35-30)(35-40)(35-50)     ^^g(35-25)(35-40)(35-50) 
^~     (25-30)(25-40)(25-50)       '*  (30^2^5) (30 -40) (30 -50) 
g^^(35-25)(35-30)(35-50)     ^^^^(35-25)(35-30)(35-40) 
(40-25)(40-30)(40-50)  (50-25)(50-30)(50-40) 

^_  10.4-1- 50-475+4205-4-62 
=77-5. 
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Number  of  cigarettes  bought 
Fig.  (13). 


Reasoning  made  Clear  with  the  Help  of  Graphs  or  Curves.    The 

graphical  method  not  only  produces  an  instructive  picture  of  a 
scheme  of  observations,  but  it  may  also  be  used  effectively  on 
occasion  to  pilot  one  through  the  intricacies  of  economic  or  similar 
argument.  The  eye  is  a  very  ready  pupil  and  is  quick  to  pass  on 
what  it  sees  to  the  mind  ;  it  acts,  that  is  to  say,  as  an  ally  to  the 
understanding,  which  might  get  on  without  it,  but  which  certainly 
gets  on  faster  with  it. 

To  illustrate  this  we  shall  consider  the  first  principles  of   an 
interesting  class  of  curves  relating 
to  supply  and  demand.* 

Cur'ge  of  Demand.  Conceive  a 
smoker  who  buys  cigarettes  at 
the  rate  of  x  per  day,  and  pays  for 
them  at  the  rate  of  y  pence  each. 
Altogether  they  cost  him  there- 
fore a  sum  of  xy  pence  per  day, 
which  is  conveniently  measured 
by  the  rectangle  OABC  in  fig.  (13). 
Notice  that  the  cost  price  of  each  single  cigarette  is  here  represented 
by  the  area  (2/X  1),  while  the  total  expenditure  is  represented  by  the 
area  (yxx). 

Now  let  us  suppose  his  country  is  at  war  and  that  the  smoker, 

to  put  himself  in  a  position  to  discourage  luxuries,  decides  to  give 

Y  up   smoking.      Let   us  try  to 

D  '  measure  in  terms  of  pence  the 

cost  of  this  great  sacrifice  to 
'  him  on  the  first  day. 

The  first  cigarette  is  probably 
the  hardest  to  do  without,  and 
the  desire  for  it  is  so  strong 
that,  if  it  were  a  mere  matter 
of  money  and  not  of  patriotism, 
~X  he  would  be  willing  to  give  as 
many  pence  as  are  represented, 
say,  by  the  rectangle  1-1  in 
fig.  (14)  in  order  to  have  it  to  smoke.     If  he  went  on  to  bargain 


'^C 


12  34 

Number  of  cigarettes  bought 
Fig.  (14). 


[*  A  fuller  account  of  these  curves  will  be  found  in  Cunynghame's  Geometrical 
Political  Economy,  where  a  rather  more  accurate  interpretation  of  "surplus 
ralue"  is  given,  involving  the  introduction  of  subordinate  curves.  The 
simplified  statement  here  adopted  seemed  sufficient  in  an  introductory  course. 
Marshall's  Principles  of  Economics  also  contains  many  fascinating  illustrations 
of  the  use  of  such  curves,  mainly  in  footnotes.  ] 
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with  himself  in  imagination,  he  would  not  be  ready  to  offer  quite 
so  much  for  the  satisfaction  of  a  second  smoke  soon  after  the 
first :  he  would  perhaps  only  give  a  number  of  pence  represented 
by  the  rectangle  2-2  in  the  figure  for  this  second  cigarette.  And 
if  it  came  to  a  third  he  would  offer  less  still,  only  '  3-3  '  pence 
perhaps,  for  the  fourth  '  4-4 '  pence,  and  so  on.  The  rectangles 
here  are  of  varying  height,  but  each  stands  on  a  base  of  unit  length. 
Thus  we  find  that  the  total  sum  he  would  be  prepared  to  offer, 
bargaining  for  cigarette  after  cigarette  in  this  way,  would  be  repre- 
sented by  the  sum  of  the  rectangles  1-1,  2-2,  3-3  ...  in  fig.  (14), 
where  the  addition  of  each  unit  length  along  OX  means  one  more 
cigarette  in  imagination  smoked,  and  a  diminution  of  unit  length 
in  an  ordinate  parallel  to  OY  means  a  reduction  of  Id.  per  cigarette 
in  the  price  the  smoker  would  be  prepared  to  pay. 

But  if  he  fell  a  prey  to  his  persistent  craving  and  actually  bought 
a  number  of  cigarettes  represented  by  OA  in  the  figure,  each  would 
cost  him  in  the  ordinary  way  only  a  number  of  pence  represented 
by  AB,  say,  i.e.  area  (ABx  1),  and  his  total  expenditure  would  thus 
be  measured  by  the  area  of  the  rectangle  OABC.  He  would  get 
them,  that  is  to  say,  for  less  than  he  would  be  prepared  to  give 
rather  than  go  without  them.  The  difference,  the  area  of  the 
rectangles  making  up  the  portion  BODE  of  fig.  (14),  represents  the 
measure  in  pence  of  surplus  enjoyment  which  he  would  obtain  free 

of  charge,  or  it  represents  the 
measure  of  free  sacrifice  he 
makes  if  he  is  true  to  his 
patriotic  principles. 

Let  us  now  take  an  example 
on  a  larger  scale.     Imagine  a 
small    community    of    people, 
producers  and  consumers,  buy- 
ing and  selling  among   them- 
selves.     Some    of     them    are 
coalowners    and    sell    coal    to 
the  others  in  the  open  market, 
where  competition  is  supposed  free  and  unrestricted  in  any  way.     This 
last  condition  is  emphasized,  because  it  is  seldom  perfectly  satisfied 
in  the  real  world  of  commerce. 

Just  as  in  the  previous  case  we  may  represent  the  number  of 
cwts.  of  coal  bought  by  a  length  OA  measured  along  OX  in  fig.  (15), 
and  the  price  actually  paid  in  shillings  per  cwt.  by  the  area  of  a 
rectangle  on  unit  base  and  of  height  00  along  OY.     Thus  the 
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Number  of  cwts.  of  coal  bought 

Fig.  (15). 
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total  cost  to  the  consumers  in  shillings  is  measured  by  the  area  of 
the  rectangle  OABC. 

But  here  again  we  may  picture  the  consumers  during  a  coal 
shortage,  when,  rather  than  go  without  the  first  cwt.  of  coal,  some 
one  among  them  would  be  ready  to  offer  for  it  as  many  shillings  as 
are  represented  by  the  rectangle  1-1  in  fig.  (15),  and  for  the  second 
cwt.  some  one  would  be  ready  to  offer  '  2-2  '  shillings,  for  the  third 
'  3-3  '  shillings,  and  so  on.  The  demand  for  coal  could  thus  be 
measured  in  shillings  by  the  sum  of  the  rectangles  1-1,  2-2,  3-3 
.  .  .  and,  if  OA  runs  into  thousands  of  units  of  coal,  the  lengths 
0-1,  1-2,  2-3  .  .  .  along  OX,  corresponding  to  additions  of  1  cwt. 
in  the  quantity  bought,  would  in  the  limit  be  so  small  that  the 
sum  of  the  rectangles  would  become  practically  equivalent  to  the 
curvilinear  area  OAED  in  the  figure,  where  DE  is  a  curve  drawn 
through  the  summits  of  the  rectangles,  namely  the  curve  of  demand. 

The  consumers'  surplus  in  this  case  would  be  measured  in  shillings 
by  the  area  BCDE,  this  being  the  difference  between  the  measures 
of  the  sum  actually  paid  for  the  coal  bought  and  the  sum  consumers 
would  have  been  willing  to  pay  rather  than  go  without  it. 

Curve  of  Supply.  Now  let  us  consider  the  question  from  the 
point  of  view  of  the  coalowners.  We  shall  assume  that  the  average 
cost  of  production  per  cwt.  of 
coal  increases  steadily  as  the 
number  of  cwts.  produced  in- 
creases ;  this  would  not  be  an 
unreasonable  assumption  in  most 
cases  after  passing  a  certain  point, 
since  the  richer  coal  measures 
known  are  likely  to  be  mined 
before  the  poorer  ones,  and  the 
cost  of  mining  near  the  surface 
is  bound  to  be  less  than  when 
deep  shafts  have  to  be  bored. 

If,  then,  OA,  fig.  (16),  represents  the  number  of  cwts.  of  coal 
sold,  and  if  the  price  in  shillings  per  cwt.  at  which  it  is  sold  is  de- 
noted by  the  area  of  a  rectangle  on  unit  base  and  of  height  OC 
along  OY,/the  total  payment  received  by  the  coalowners  will  be 
measured  in  shillings  by  the  area  of  the  rectangle  OABC. 

But  the  cost  of  producing  the  first  cwt.  is  perhaps  measured 
by  the  rectangle  1-1,  that  of  producing  the  second  cwt.  by  the 
rectangle  2-2,  the  third  by  the  rectangle  3-3,  and  so  on,  each  rectangle 
being  drawn  on  unit  base  representing  an  advance  of  1  cwt.     (The 
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Number  of  cwts.  of  coal  sold 
Fia.  (16). 
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advance  in  the  cost  of  production  would  not  in  reality  be  measured 
by  so  much  the  cwt.  of  course,  but  the  assumption  is  inaccurate 
in  degree  only,  not  in  principle,  and,  by  making  it,  the  argument 
is  rendered  clearer.)  Thus  the  actual  cost  of  production  is,  in  the 
limit  when  OA  is  very  large  and  divided  up  into  relatively  very 
small  parts,  measured  in  shillings  by  the  curvilinear  area  OAED, 
where  DE  is  a  curve  drawn  through  the  summits  of  the  rectangles, 
namely,  the  curve  of  supply. 

The  difference,  BODE,  between  the  areas  OABC  and  OAED 
represents  what  is  known  as  producers'  surplus,  for  it  measures  the 
profit  made  by  the  owners  in  selling  the  coal  at  a  higher  price  than 
the  cost  price  of  production. 

Now  let  us  combine  the  curve  of  supply  (S.C.)  and  the  curve  of 
demand  (D.C.)  in  the  same  'figure,  fig.  (17).     Their  meeting  point 

P  determines  the  number  of  cwts. 
of  coal  bought  (x),  and  the  selling 
price  in  shillings  per  cwt.  (y). 
For  it  is  clear  that  under  normal 
conditions  it  would  not  be  profit- 
able to  coal  producers  to  pass  this 
point,  because  beyond  it  the  de- 
mand on  the  part  of  coal  consumers 
measured  in  money  is  less  than 
the  cost  of  production  :  they  are 
not  willing  on  the  average  to  pay 
so  much  as  ys.  per  cwt.  for  it, 
and  it  costs  more  than  i/s.  per  cwt. 
on  the  average  to  produce.  If, 
on  the  other  hand,  the  amount  of  coal  produced  decreases  below 
X  cwts.,  the  greater  this  decrease  the  higher  does  the  profit  become 
on  the  sale  of  it,  because  the  greater  is  the  difference  between  the 
cost  price  and  the  selling  price  ;  hence,  as  profits  become  more 
pronounced,  recruits  will  be  attracted  into  the  coal-producing 
business,  and,  if  this  goes  on,  deeper  shafts  will  have  to  be  bored 
and  poorer  fields  worked  until  profits  begin  to  decrease  again  and 
the  supply  once  more  approaches  x  cwts.  Thus  sooner  or  later 
the  production  of  coal  and  its  market  price  will  tend  to  the  level 
determined  by  the  equilibrium  point  P  where  the  supply  and 
demand  curves  meet. 

Endless  varieties  of  problems  may  be  discussed  by  altering  the 
conditions  and  observing  the  effect  produced  in  the  standard 
diagram.    Three  examples  will  suffice  to  illustrate  the  method. 


N  X 

Number  of  cwts.  of  coal  bought  or  sold 

S.C.  =  Supply  curue 

D.C.  =Demand  curue 

Fig.  (17). 
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1.  Effect  of  a  Change  in  Normal  Demand.  Here  we  suppose  the 
normal  conditions  of  supply  are  unaltered — it  costs  just  as  much 
as  before  to  produce  the  same  amount  of  the  commodity  in  question  ; 
but  a  more  eager  demand  on  the  part  of  consumers  shows  itself  in  a 
readiness  to  purchase  more  at  any  given  price  than  would  have 
been  purchased  under  the  old  conditions  :  this  may  conceivably 
be  due  to  a  general  increase  in  the  purchasing  power  of  these  con- 
sumers, or  it  may  be  the  result  of  a  shortage  of  some  other  com- 
modity which  causes  this  one  to  be  more  widely  used,  just  as 
margarine,  for  instance,  has  been  known  to  take  the  place  of  butter  ; 
whatever  the  reason  may  be,  the  effect  is  that  the  demand  curve 
now  occupies  a  higher  level  throughout  its  length,  D'C  in  place  of 
D.C.  in  the  figures. 

When  we  turn  to  the  supply  side  of  the  question,  there  are  three 

Y 


N      N'  X 

Decreasing  Return 

stages  which,  although  they  shade  into  one  another  in  practice,  it 
is  well  to  separate  clearly  in  theory  :  (1)  the  only  supplies  immedi- 
ately available  are  those  actually  in  the  hands  of  dealers  ;  (2)  to 
meet  the  increased  demand,  and  so  earn  for  themselves  increased 
profits,  manufacturers  wdll  speed  up  production,  by  working  over- 
time, etc.,  with  the  help  possibly  of  any  disengaged  labour  or 
capital  they  may  be  able  to  secure,  and  the  resulting  extra  supphes 
will  be  available  after  a  short  time  ;  (3)  if  the  demand  continues 
unabated,  manufacturers,  by  offering  higher  wages  and  interest, 
will  seek  to  attract  fresh  labour  and  capital  from  other  engagements 
into  their  business,  and,  by  renewing  their  machinery  and  generally 
improving  their  organization,  they  will  produce  on  a  larger  and 
relatively  more  economical  scale.  Moreover,  other  manufacturers, 
seeing  the  profits  to  be  earned,  will  be  attracted  into  the  same  line 
of  business  also,  so  that  by  this  time  the  current  available  supplies 
of  the  commodity  may  exceed  very  appreciably  their  old  figure. 
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But  all  this  happens  only  in  the  long  run,  and  the  economist  has 
always  to  bear  this  extremely  important  element  of  time  carefully 
in  mind  when  he  seeks  to  estimate  the  effects  of  any  proposed 
action. 

We  assume  then  that  the  new  demand  remains  long  enough  at 
its  higher  level  to  allow  for  the  gradual  adjustment  in  this  way  of 

supply  to  the  changed 
conditions,  and  for  the 
economic  forces  called  into 
play  once  again  to  arrive 
at  a  balance  between 
them,  most  likely  at  a 
new  equilibrium  point. 
3C.  The  two  figures  illustrate 
the  difference  in  effect 
according  as  the  produc- 
tion of  the  commodity  is 
subject  to  a  decreasing  or 


Increasing  Return 


an  increasing  return,  i.e.  according  as  the  cost  of  production  rises 
or  falls  when  the  amount  produced  is  increased.  In  both  cases  it 
will  be  noted  that  more  of  the  commodity  is  produced  (ON'  in  place 
of  ON)  in  answer  to  the  keener  demand,  but  the  difference  is  much 
greater  in  the  second  case  than  in  the  first.  Also  the  price  has 
gone  up  in  the  first  case,  while  in  the  second  it  has  gone  down, 
the  difference  being  measured  by  the  change  in  PN. 

2.  Effect  of  a  Tax.  If  the 
tax  is  at  the  rate  of  so  much 
per  unit  (say  Is.  per  unit,  if 
the  price  is  measured  in  shil- 
lings) of  the  commodity  pro- 
duced, this  will  raise  the 
supply  curve,  S.C.,  bodily  up 
a  distance  of  1  unit  into  the 
position  S'.C,  fig.  (18),  be- 
cause the  effect  is  the  same 
as  if  Is.  were  added  to  the 
cost  of  each  unit  in  produc- 
tion. The  production  will 
thus  be  diminished  by  N'N  units,  for  P'  is  the  new  equilibrium 
point ;  the  selling  price  will  be  increased  by  P'Ms  per  unit — by 
less,  it  should  be  noted,  than  P'Q  or  K'K,  the  amount  of  the  tax ; 
producers'  surplus,  which  is  analogous  to  what  economists  term 
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Fig.  (18). 
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rent,  is  diminished  by  (area  KPL— area  KT'L')s  ;  consumers' 
surplus  is  diminished  by  (area  PLL'P')s ;  finally,  the  tax  produces 
for  the  Treasury  a  number  of  shillings  represented  by  a  rectangle 
with  sides  of  length  ON'  and  KK'. 

3.  Effect  of  a  Monopoly.  A  monopolist  has  the  power  to  stop 
production  short  of  the  true  equilibrium  point,  so  that  ON'  cwts., 
fig.  (19),  are  produced  in  place  of  the  ON  cwts.  which  free  competi- 
tion would  demand.  The  selHng  price  is  thus  raised  by  Q'Ss.  per 
cwt.  ;  producers'  surplus  is  increased  by  (area  KP'Q'M'— area 
KPL)s  ;  while  consumers'  surplus 
is  diminished  by  (area  PLD— area 
DM'Q')s. 

A  word  of  explanation  is  neces- 
sary before  leaving  the  subject  of 
these  supply  and  demand  curves. 
It  is  probable  that  the  reader  will 
have  questioned  the  possibiHty  of 
drawing  such  curves  for  any  com- 
modity with  sufficient  accuracy  to 
be  of  any  value,  but  it  would  be 
enough  as  a  rule  to  be  able  to  estimate  what  would  happen 
if  a  slight  variation  occurred  in  price  or  in  production,  and  such 
an  estimate  may  sometimes  be  made  by  actual  trial :  e.g.  a  good 
practical  farmer  most  likely  knows  nothing  about  supply  and 
demand  curves  as  such,  yet  from  past  experience  he  has  a  pretty 
shrewd  notion  as  to  how  far  it  may  be  profitable  to  spend  an  extra 
pound  here  in  rearing  calves  and  a  pound  less  there  in  cultivating 
crops,  bearing  in  mind  the  prices  which  cattle  and  com  might  be 
expected  to  fetch.  From  his  point  of  view  the  interest  of  the 
curves,  if  he  knew  anything  of  them,  would  be  centred  in  those 
portions  which  correspond  to  normal  conditions,  i.e.  somewhere  in 
the  neighbourhood  of  the  equilibrium  point  under  the  free  play  of 
ordinary  competition. 

Their  real  value,  however,  as  suggested  at  the  beginning,  does 
not  consist  in  the  practical  assistance  which  they  afford  to  the  pro- 
ducer or  consumer,  by  way  of  foretelling  the  actual  measure  of 
consumption  or  production,  so  much  as  in  the  light  they  throw 
upon  general  tendencies  which  are  rather  apt  to  be  obscured  if  they 
are  ponderously  presented  with  elaborate  economic  argument. 
They  make  plain  in  a  moment  to  the  eye  what  can  only  be  stated 
in  two  or  three  pages  of  writing. 


CHAPTER   X 

COBRELATION 

One  of  the  most  important  questions  which  can  be  discussed  by 
statistical  methods  is  that  of  possible  connection,  or  correlation,  as 
it  is  called,  between  two  sets  of  phenomena.  If  some  factor  in 
each  can  be  isolated  and  measured  numerically,  our  object  is  to 
discover  if  the  size  of  either  is  sympathetically  affected  when  a 
change  occurs  in  the  size  of  the  other  ;  or,  to  put  the  matter  in 
another  way,  do  large  values  of  the  one  factor  go  with  large  values 
of  the  other  factor  and  small  with  small,  or  vice  versa  ?  And,  if 
some  mutual  dependence  of  this  kind  exists,  can  an  estimate  of 
its  extent  be  made  ? 

Consider,  for  example,  the  factor  or  character  of  height  in  husband 
and  wife.  Is  there  any  connection  between  stature  of  husband  (x) 
and  stature  of  wife  (y)  ?  Do  tall  men  tend  on  the  average  to  wed 
tall  women,  or  do  we  find  tall  men  choosing  short  women  for  wives 
just  about  as  often  as  they  choose  tall  women  ?  When  correla- 
tion exists  we  shall  want  some  measure  for  it  which  wiU  tell  us 

the  amount  of  change  or  devia- 
tion from  the  average  in  either 
character  associated  with  a  given 
change  or  deviation  from  the 
average  in  the  other. 

In  studying  graphs  we  saw  how 
some  hint  of  the  existence  of 
correlation  might  be  discovered, 
but  we  wish  now  to  go  a  little 
more  deeply  into  the  subject. 
The  first  step  is  to  measure  an 
adequate  number  of  pairs  of  values,  x  and  y,  of  the  characters 
concerned  in  order  to  find  what  values  are  associated  together, 
and  how  frequently  the  same  values  are  repeated.  When  this  is 
done  we  can  draw  up  a  table  of  double  entry,  see  fig.  (20),  setting 
out  in  rows  and  columns  the  frequencies  observed.  An  examina- 
tion of  Table  (25),  showing  the  variation  of  braiii  weight  with  age 
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in  the  case  of  197  Bohemian  women,  will  make  clear  what  is  meant. 
The  x's  from  x^  to  x^  and  the  y's  from  y^  to  y^  are  supposed  to 
ascend  in  magnitude,  and  when,  for  example,  the  pair  of  values 
(Xg,  yz)  is  observed  to  be  repeated  nine  times,  the  number  9  is  placed 
in  the  second  column  and  third  row  of  the  table,  so  that  the  frequency 
of  each  class  is  found  recorded  in  the  square  proper  to  it :  thus, 
out  of  the  sample  in  Table  (25),  there  are  10  women  between  the 
ages  of  40  and  50  with  brams  weighing  between  1300  and  1400 
grams. 


Table  (25).  Variation  of  Brain  Weight  with  Age  in  the 
Case  of  certain  Bohemian  Women. 

[Data  from  Biometrika,  vol.  iv.  pp.  13  et  seq..  Variation  and  Correlation 
in  Brain  Weight,  by  Raymond  Pearl.] 


Age  in  years 

^1 

20-30 

^2 

30-40 

^3 
40-50 

50-60 

60-70 

70-80 

Totals 

CO 

i 

.^ 

-c: 

s 
1 

y. 

1000-1100 

1 

_ 

1 

1 

- 

3 

y^ 

1100-1200 

2 

2 

4 

2 

5 

4 

19 

^3 

1200-1300 

28 

9 

8 

14 

10 

4 

73 

1300-1400 

26 

14 

10 

6 

5 

4 

65 

1400-1500 

13 

7 

7 

2 

2 

31 

1500-1600 

2 

3 

■   - 

1 

- 

- 

6 

Totals 

72 

35 

30 

26 

20 

14 

197 

Mean  y 

1325 

1350 

1310 

1285 

1250 

1279 

When  each  class  interval,  as  in  this  table,  includes  a  small  range 
of  values,  the  x  and  y  may,  as  an  approximation,  be  taken  as  the 
mid  values  of  their  class  intervals  :  2/3  would  be  taken,  for  instance, 
as  1250,  though  it  really  includes  all  values  between  1200  and 
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1300  grams.  Strictly  in  such  cases  each  single  observation  is  not, 
geometrically  speaking,  located  at  a  definite  point,  but  lies  some- 
where within  a  small  area,  though  it  is  treated  as  if  it  had  the  values 
X  and  y  which  apply  to  the  centre  point  of  the  area.  It  is  some- 
times possible  to  correct  for  this  assumption  by  what  is  known  as 
Sheppard's  adjustment,  but  we  shall  not  concern  ourselves  with 
the  correction  in  the  present  discussion,  so  as  to  avoid  complications, 
because  the  difference  made  is  not  generally  large. 

The  table,  when  drawn  up,  may  immediately  suggest  some 
intimate  connection  between  x  and  y.  It  may  indicate  that  as 
X  increases  y  also  in  general  increases,  or  that  y  tends  to  fall  in 
value  as  x  grows  bigger.  But  a  more  refined  analysis  is  neccHsary. 
It  would  be  instructive  perhaps  to  travel  along  the  row  of  x'a,  find- 
ing what  mean  value  of  y  is  associated  with  x^,  what  mean  value 
of  y  is  associated  with  X2,  and  so  on.  This  would  give  a  sounder 
basis  for  judging  whether,  as  x  increased,  y  in  general  increased  or 
decreased  as  the  case  might  be  :  for  example,  in  Table  (25)  the 
mean  values  of  y  associated  with  the  several  types  of  x  are  shown 
in  their  proper  columns  at  the  foot  of  the  table  and  clearly,  as 
X  increases,  y  tends  to  decrease,  apart  from  conflicting  readings  at 
the  beginning  and  end  of  the  table,  and  the  latter  of  these  may  not 
be  significant  of  any  real  difference  in  brain  weight  at  the  end  of 
life,  for  it  is  only  based  on  fourteen  observations  ;  generally,  the 
inference  from  this  table  would  be  that  the  weight  of  the  brain 
decreases  as  the  age  increases  after  maturity  is  once  reached, 
although,  of  course,  it  would  be  rash  to  make  more  than  a  tentative 
statement  with  so  small  a  sample  at  our  disposal. 

Let  us  suppose  y^  to  be  the  mean  value  of  y  associated  with  x^, 
y^  the  mean  value  of  y  associated  with  0^2,  y^  with  Xq,  and  so  on. 
If  these  values  {x^,  y^),  [x^,  y^),  {^3,  §3),  etc.,  are  plotted,  it  is  very 
often  found  that  they  cluster  more  or  less  closely  about  a  straight 
line,  see  fig.  (21),  so  that  we  are  led  to  ask  whether  there  is  not 
some  line  which  will  very  fairly  describe  the  run  of  the  points  ; 
the  equation  of  such  a  line  would  be 

y=.mx-\-c, 

and  if  m  and  c  were  known  we  could  find  from  this  equation  the  best 
average  value  of  y  corresponding  to  any  given  '^. 

But,  on  reflection,  ^1,  §2,  ^3  •  .  •  are  themselves  only  the  best 
2/'s  corresponding  to  the  particular  values  Xi,  Xg,  Xq  .  .  .  oi  x,  so 
that  the  problem  is  really  the  same  as  that  of  finding  the  relation 

y=mx-\-c, 
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based  on  all  the  observations,  which  will  enable  us  to  estimate  the 
best  y  corresponding  to  any  given  x. 

Now  for  any  value  x^  of  x  the  value  of  y  given  by  this  relation 
is  (mx^-i-c),  while  by  observation  we  may  find  more  than  one  value 
of  y  corresponding  to  the  value  Xj^  of  x.  If  y^  be  one  such  value 
the  dijfference  between  it  and  the  value  given  by  the  above  rela- 
tion is 

(ma;i+c)— 2/1. 

This  difference  we  may  regard  as  the  error  made  in  estimating _y     A 
from  the  relation  instead  of  taking  the  value  given  by  observatibn  // 
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70 


80 


which  for  the  moment  we  think  of  as  the  true  value.  The  best 
relation  will  then  clearly  be  the  one  which  makes  all  such  errors  of 
estimate  as  small  as  possible.  But,  algebraically,  some  of  these 
errors  are  positive,  i.e.  the  value  of  y  given  by  the  relation  is  greater 
than  that  given  by  observation,  and  some  are  negative,  and  it  is 
only  their  magnitudes  that  we  wish  to  take  into  accomit.  Accord- 
ingly we  follow  the  method  used  in  finding  the  standard  deviation 
in  order  to  get  rid  of  the  ambiguities  of  sign  :  we  form,  that  is  to 
say,  the  sum  of  the  squares  of  the  errors,  because  the  expression  so 
formed  will  clearly  be  least  when  each  separate  error  is  as  small  as 
possible  in  absolute  magnitude. 
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To  find,  then,  the  values  of  m  and  c  which  will  make 

(mXi-f-c-2/i)2_|_(,^^^c-2/2)^-f  .  .  .  +(^a^n+c--yn)^  j 

a  minimum  (see  Note  7  in  the  Appendix),  where  n  is  the  total  ' 

number  of  pairs  of  observations. 

The  required  values  are  given  by  differentiating,  first  with  regard  i 

to  c  treating  m  as  constant,  and  then  with  regard  to  m  treating  c 
as  constant,  putting  each  result  equal  to  zero.     Thus 

(ma;i+c-2/i)+  .  .  .  +(ma;„+c-2/J=0   J^  .| 

Therefore  m(x^-{-  .  .  ,  -\-Xn)-\-nc—{y^^  .  .  .  +2/n)=0 

m{x^^^  .  .  .  +a:„2)+c(xi+   .  .  .  ■^x^)-^(x^y^-\-  .  .  .  x^y^)=0. 

The  first  of  these  equations  gives  i 

m(nx)-\-nc—{ny)=0,  '■■ 

I 
i.e.  mx-{-c—y=0,  j 

i 
where  x  is  the  mean  of  all  the  x's  and  y  is  the  mean  of  all  the  y's,         j 

and  it  expresses  the  fact  that  the  line  y=mx-\-c  passes  through         ' 
the  point  {x,  y).  \ 

This  might  have  been  expected,  for,  graphically,  each  pair  of         ; 
observations  (:tj^,  2/1),  (a:2J  2/2)'  (^3' 2/3)  •  •  •  corresponds  to  some  point,         < 
and  if  we  look  for  the  line  y=mx-{-c  passing  through  the  region 
where  they  cluster  most  thickly  together  we  should  certainly  expect 
it  to  pass  through  their  mean  or  centre  of  gravity  [x^  y).     This        j 
suggests  how  the  values  of  m  and  c  may  be  considerably  simplified. 
If  we  measure  all  the  cc's  from  x,  their  mean,  and  all  the  ^/'s  from  y,        j 
their  mean,  which  is  equivalent  to  taking  the  point  (x,  y)  as  origin 
and  replacing  every  x  by  its  deviation  ^  from  x  and  every  y  by        \ 
its  deviation  "n  from  y,  the  first  of  the  above  relations  is  reduced 
to  c=0.  and  therefore  the  second  becomes  ! 


© 

m(^^^+  .  .  . 

+L')-(^x\+  ■  ■  ■ 

+^»%)=o. 

Hence 

■    ■    +L-nn)l(^x'+    ■    ^ 

.  •  +L') 

where  p  is  the  mean  of  all  the  product  pairs  f^,  and  a^  is  the  standard 
deviation  of  all  the  a:'s. 
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Thus  the  required  equation  for  estimating  the  best  v  correspond- 
ing to  any  particular  f  is 

p  '  '  > 

whence  y—y=-^A^—^)     •  •         .     (1)""  i. 

The  coefficient  p/crj^  in  this  equation  evidently  gives  the  deviation 
in  y  from  the  mean  y  con*esponding  to  unit  deviation  in  x  from 
the  mean  x,  for  when  {x—x)=l,  {y—y)=p!(Tj^.  Hence  the  greater 
this  coefficient  is,  the  greater  will  be  the  change  in  y  resulting  from, 
or  at  all  events  coexistent  with,  unit  change  in  x. 

Thus  p/aa:^  would  seem  to  supply  a  not  unreasonable  measure  of 
the  correlation  between  x  and  y.  But  there  is  something  very 
unsymmetrical  about  this  result.  Why  should  the  correlation  be 
measured  by  pla^s^  any  more  than  by  pJGy^  ?  In  fact,  we  might 
repeat  the  whole  of  the  previous  argument,  interchanging  x  and  y 
throughout  wherever  they  appear.  In  that  case  we  should  first 
travel  down  the  column  of  i/'s  and  calculate  the  mean  values  of  x 
associated  with  2/is  2/i»  2/3  •  •  •  respectively.  This  would  give  a  set 
of  points  {xj^,  2/1),  {x^,  yo),  (Xq,  2/3),  •  •  .  ,  which,  when  plotted,  would 
perhaps  lie  approximately  in  a  straight  line.  We  should  thus  be 
led  to  look  for  some  relation 

x=m'y-{-c' 

which  would  enable  us  to  estimate  the  best  average  x  corresponding 
to  a  J/  of  given  type,  and,  proceeding  just  as  before,  we  should 
ultimately  obtain  the  equation 

or  (x-x)=^^Ay-y).    .        .        -    (2) 

^/ 

in  which  the  coefficient  pjuy^  givQQ  now  the  deviation  in  x  from  the 

mean  x  corresponding  to  unit  deviation  in  y  from  the  mean  y. 

Hence  p/ffy^  has,  seemingly,  just  as  much  claim  asp/cja-^  ^o  measure 

the  correlation  between  x  and  y.     The  one  gives  the  change  in  x 

corresponding  to  unit  change  in  y  :   the  other  gives  the  change  in  y 

corresponding  to  unit  change  in  x  ;   and  the  only  reason  why  they 

differ  is  because  unit  change  in  x  does  not  mean  the  same  thing  as 

unit  change  in  y  :   their  standards  of  changeableness  or  variability 

are  not  equal.     If  then  we  could  alter  the  scales  of  measurement 

so  that  unit  change  in  each  were  of  the  same  magnitude,  the  two 

coefficient!^  obtained  ought  to  become  identical,  and  we  should  then 

have  a  really  satisfactory  measure  for  the  correlation  required. 
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With  this  object  let  us  examine  the  variability  of  the  x's  and 
compare  it  mth  the  variability  of  the  t/'s.  Now  the  total  dispersion 
of  the  di£ferent  x's  on  either  side  of  x,  the  mean  x,  is  conveniently 
measured  by  g^,  their  standard  deviation.  And  similarly  the 
dispersion  of  the  y's  on  either  side  of  y,  the  mean  y,  is  measured 
by  Gy.  The  bigger  cja-  is,  the  greater  is  the  variability  of  the  oj's, 
and  the  bigger  Cy  is,  the  greater  is  the  variability  of  the  y's.  Hence, 
in  equations  (1)  and  (2),  (x—x)  should  be  divided  by  o-a.  and  [y—y) 
by  Gy  if  we  want  to  work  with  the  same  unit  of  change  or  variability 
in  each  case.     The  equations  then  become 


and 


\      Gy       /  Gg,Gy\      Gj. 

x-x\       p    iy-y 


^x^y\   ^y 


Write  r=plGjPy  ;  then  r  is  taken  to  be  the  coefficient  of  correla- 
tioUy  for  it  measures  the  change  in  either  character  corresponding  to 
unit  change  in  the  other  when  the  units  are  made  comparable. 

The  lines  giving  the  best  y  for  a  given  x  and  the  best  x  for  a 
given  y  may  now  be  written 


y—y^r-^(x—x) 


and  x—xz^r—iy—y), 

G„ 


and  they  are  called  lines  of  regression.  The  term  regression  was 
first  used  by  Sir  Francis  Galton  in  a  paper  entitled  Regression 
towards  Mediocrity  in  Hereditary  Stature,  though  the  root  idea 
is  not  by  any  means  confined  to  characters  affected  by  heredity  : 
it  holds  for  any  pair  of  correlated  variables.  Galton  found  that 
if  a  number  of  tall  fathers  are  selected  and  their  heights  measured, 
the  mean  height  being  calculated,  and  if,  further,  the  heights  of  the 
sons  of  these  fathers  are  measured,  their  mean  height  being  like- 
wise calculated,  the  latter  is  not  equal  to  the  mean  height  of  the 
selected  fathers,  but  is  rather  nearer  the  mean  height  of  the  popula- 
tion as  a  whole.  There  is,  that  is  to  say,  a  regression  or  stepping 
back  of  the  variable  towards  the  general  average.  Professor  Karl 
Pearson  has  remarked  that  '  in  the  existing  state  of  our  knowledge 
the  recognition  that  the  true  method  of  approaching  the  problem 
of  heredity  is  from  the  statistical  side,  and  that  the  most  we  can 
hope  at  present  to  do  is  to  give  the  probable  character  of  t^  e  offspring 
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of  a  given  ancestry,  is  one  of  the  great  services  of  Francis  Galton  to 
Biometry.' 

The  expressions  r—  and  r—  are  called  coefficients  of  regression, 

and  they  register  in  the  above  particular  case  the  amount  of  abnor- 
mality to  be  expected  in  the  height  of  the  sons  when  the  amount  of 
abnormality  in  the  height  of  the  fathers  is  known,  and  vice  versa. 
The  regression  of  the  sons'  height,  y^  on  the  fathers'  height,  x,  is, 
in  fact,  defined  as  the  ratio  of  the  average  deviation  of  the  heights 
of  the  sons  from  the  mean  height  of  all  sons  to  the  deviation  of  the 
heights  of  the  fathers  from  the  mean  height  of  all  fathers,  and  hence 
it  may  be  written 

To  make  the  definition  more  general,  instead  of  speaking  merely  in 
terms  of  height,  we  refer  to  any  row  or  column — ^for  there  is  no 
intrinsic  difference  between  row  and  column — in  a  table  like 
Table  (25)  as  an  array  of  y's  or  of  x's,  and  selecting  a  particular 
type  J  say  a  particular  value  of  x  (like  fathers  of  height  x),  we  define 
the  regression  of  the  corresponding  array  of  y's  (like  heights  of  sons 
of  these  fathers)  on  the  type  x  to  be  the  ratio  of  the  average  devia- 
tion of  the  array  of  y's  from  the  mean  y  to  the  deviation  of  the 
selected  type  x  from  the  mean  x. 

Example.  To  illustrate,  let  us  take  some  figures  due  to  Professor 
Pearson  and  Dr.  Alice  Lee  [Biometrika,  vol.  ii.  pp.  357  et  seq.,  On 
the  Laws  of  Inheritance  in  Man].  Suppose  the  mean  stature  of  all 
observed  fathers,  based  on  a  sample  of  over  1000  observations 
=67-68  in.,  with  S.D.=2-70  in. 

Also  suppose  the  mean  stature  of  all  sons= 68-65  in.,  with  S.D. 
=2-71  in.,  and  that  the  correlation  r  between  stature  of  father 
and  stature  of  son= 0-514. 

The  regression  of  son  on  father  as  regards  stature  is  then  given  by 

(.V-68-65)=:  (0-514)— (x-67-68) 

where  x  is  the  height  of  selected  fathers  and  y  the  mean  height  of 

their  sons. 

Hence  2/=0-516x+33-73, 

so  that  if  we  selected  fathers  of  height  70  in.,  for  example,  the 
mean  height  of  their  sons  would  not  be  70  in.,  but 

(0-516)(70)+33-73=69-85  in., 
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i.e.  there  is  a  regression  towards  the  general  mean,  68-65  in.,  of 
all  sons. 
Also  the  coefficient  of  regression 

^  =(0-514)(2-71)/(2-70) 

=0-516. 

It  is  not  difficult  to  show  that  the  greatest  numerical  value  r 
can  in  general  take  is  unity,  for  consider  the  expression  for  the 
sum  of  the  squares  of  the  differences  between  the  observed  devia- 
tions of  the  y  characters  from  their  mean  and  the  corresponding 
deviations  as  deduced  from  the  best  fitting  regression  line, 

y—y=r^(x—x). 

If,  with  our  previous  notation,  'n  denote  the  observed  deviation  of 
the  one  character  y,  associated  with  a  particular  deviation,  ^,  of 
the  other  character,  x,  then,  since  (rajay,)^  denotes  the  best  value 
given  by  the  line,  the  sum  of  the  squares  of  the  differences  between 
these  values 


o-„  .  a 


2 


=na^\\-r\ 

Since  the  sum  of  a  number  of  squared  quantities  must  be  positive, 
it  follows  that  r^  must  be  less  than  1  and  hence  r  lies  between  —1 
and  +1. 

Further,  n^y^{l—r^)  can  only  vanish  if  every  one  of  the  squared 
quantities  on  the  other  side  vanishes  independently  of  the  rest, 
so  that  we  onJxget  r=:^l,  when 

In  this  case  the  deviation  of  the  one  character  from  its  mean  is 
always  exactly  proportional  to  the  deviation  of  the  other  character 
from  its  mean,  and  the  correlation  is  then  said  to  be  perfect,  for 
it  is  equivalent  to  causation.  In  perfect  correlation  a  one-to-one 
correspondence  thus  exists  between  the  values  of  the  two  char- 
acters, for  to  one  value  of  either  there  corresponds  one  and  only 
one  value  of  the  other,  and  the  standard  deviation  of  the  array 
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(measuring  its  variability)  corresponding  to  any  selected  type 
vanishes. 

Zero  correlation  is  at  the  opposite  extreme  where,  no  matter 
what  the  type  selected  in  the  one  character  may  be,  the  mean 
value  of  the  array  in  the  second  character  i^  unaffected,  because 
the  two  characters  are  quite  independent  or  uncorrelated  ;  the 
deviation  of  y  from  its  mean  bears  no  relation  at  all  to  the  deviation 
of  X  from  its  mean,  and  unit  change  in  either  is  associated  with  no 
particular  change  in  the  other,  so  that  r  must  in  this  case  be  zero. 

When  r  is  negative,  since  (y—y)l{x—x)=ra^x  a-nd  the  o-'s  are 
necessarily  positive,  corresponding  to  any  value  of  x  above  the 
mean  of  all  the  x's  the  best  value  of  (y—y)  is  negative,  that  is,  the 
best  value  of  y  is  below  the  mean  of  all  the  y's,  and  vice  versa. 
This  means  that  in  general  high  values  of  x  would  be  associated 
with  low  values  of  y,  and  vice  versa. 

If  we  take  the  mean  as  origin  so  that  the  regression  lines  become 

y=rayl(T^  .  x, 
x=rajay  .  y, 

these  Hues  coincide  with  the  axes  when  the  correlation  is  zero, 
and  with  one  another  when  r=±l  and  the  correlation  is  perfect, 
fig.  (22).  Given  two  equally 
variable  characters  (cra.=<7j,)  and 
perfect  correlation,  the  regres- 
sion lines  coincide  with  one  of 
the  bisectors  of  the  angle  formed 
by  the  axes. 

It  may  be  helpful  to  look  back 
again  now  at  the  graphical  view 
of  the  argument  leading  up  to 
the  determination  of  the  co- 
efficient of  correlation.  For 
successive  values  of  x  we  calculated  the  means  of  the  several 
2/'s  observed,  these  being  presumably  the  best  available  y's  corre- 
sponding to  the  particular  x's  selected,  and  we  assumed  that, 
when  plotted,  the  points  so  obtained,  {x^,  y^),  (ajg,  ^2)'  G'^^a?  ^3)'  •  •  •' 
lay  roughly  in  a  straight  line.  In  the  same  way  we  calculated  the 
means  of  the  several  x's  observed  to  correspond  to  particular  y's 
selected,  and  again  we  assumed  that  the  resulting  points,  (Xj,  y^), 
(^2>  2/2)'  (^3>  2/3)  ••  •  lay  roughly  in  a  straight  line.  These  assump- 
tions are  justified  in  very  many  cases,  but  when  they  fail  recourse 
must  be  had  to  other  methods  beyond  the  scope  of  this  book.     [See, 


^^^  (Mean) 


Regression  Lines  when 
Correlation  is  perfect  (r'=-¥\} 

Fig.  (22). 
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for  example,  Pearson's  paper  in  Drapers'  Company  Research  Memoirs 
Biometric  Series  ii.,  On  the  Theory  of  Skew  Correlation  and  Non- 
linear Regression,  introducing  the  correlation  ratio,  v,  which  is 
equal  to  r  in  the  particular  case  when  the  regression  is  linear.] 
Sometimes,  again,  although  the  observations  are  so  scattered  that 
the  assumption  of  a  straight  line  to  describe  the  best  fit  seems 
somewhat  wide  of  the  mark,  it  may  be  justified  on  the  ground  that 
no  better  graphical  result  would  be  given  by  using  any  other  curve 
in  place  of  the  line.  Moreover  the  linear  expression,  y=mx-\-c, 
is  simple  and  may  serve  to  give  at  all  events  the  first  two  terms  of 
some  more  complex  relation  supplying  an  estimate  for  the  most 
probable  y  corresponding  to  a  given  x. 

If  we  had  plotted  all  the  original  pairs  of  observations,  instead 
of  plotting  certain  ic's  and  the  mean  t/'s  associated  with  them,  or 

certain  i/'s  and  the  associated  mean 
ic's,  the  two  lines  of  regression  would 
not  have  stood  out  so  clearly  :  they 
would  have  lacked  definition,  like  an 
optical  image  which  is  not  strictly  in 
focus,  but  there  would  have  been  a 
concentration  of  observations,  as  of 
light,  in  the  neighbourhood  where  the 
lines  of  regression  intersect,  namely 
at  {x,  y),  the  mean  of  all  the  a;'s  and 
all  the  2/'s.  When,  however,  the  lines  of  regression  lie  close  together 
they  become  more  clearly  defined,  all  the  observations  being  centred 
then  more  nearly  in  one  line,  and  the  correlation  tends  towards 
perfection.  Such  cases  are  frequent  in  Physics  but  rare,  if  found  at 
all,  in  that  class  of  Statistics  into  which  the  element  of  human 
impulse  enters.  When  r  is  less  than  1  the  lines  of  regression,  if  the 
regression  is  of  linear  type,  will  be  inclined  to  one  another  at  some 
angle  between  0  and  90  degrees. 

If  only  a  rough  value  of  r,  the  correlation  coefficient,  is  required, 
that  may  be  obtained  by  merely  estimating  the  gradient  of  each 
regression  line  and  multiplying  the  results  together,  one  measured 
relative  to  the  axis  of  x  and  the  other  relative  to  the  axis  of  y, 
for  this  product 

=  (regression  of  y  on  x)  (regression  of  x  on  y) 


CORRELATION 


113 


Such  an  estimate  may  also  be  useful,  though  it  may  not  be  very 
dependable,  when  the  complete  distribution  of  characters  is  not 
known,  for  either  regression  line  can  be  drawn  when  any  two  points 
on  it  are  known  and  a  single  array  of  values  of  either  character 
corresponding  to  a  given  type  of  the  other  is  sufficient  to  fix  one 
such  point ;  also  the  mean  {x,  y),  if  it  were  known,  would  at  once 
give  a  point  common  to  both  regression  lines.  When  all  the  facts 
are  available,  however,  the  method  of  calculation  is  to  be  preferred 
to  that  of  simply  graphing  the  observations  and  their  means,  as  there 
is  bound  to  be  a  certain  amount  of  guesswork  and  consequent  error 
in  deciding  from  a  graph  how  the  best  regression  lines  run. 

It  is  frequently  convenient  to  refer  the  deviations  of  the  given 
variables  to  some  point  other  than  the  mean  (x,y)  as  origin,  and, 
when  this  is  done,  a  correction 
must  be  applied  to  the  resulting 
value  of  r.  We  have  already 
explained  how,  in  such  a  case, 
to  correct  for  standard  devia- 
tions, and,  as  r—pja^dy,  it  only 
remains  to  explain  how  to  cor- 
rect for  p. 

Now  p  is  given  by 

where  the  |'s  and  ^'s  denote  deviations  from  x  and  y  respectively. 
Fig.  (23)  indicates  the  changes  necessary  in  transferring  from  some 
origin  0  to  the  mean  G.  The  co-ordinates  of  P  (representing  a 
typical  observation)  referred  to  O  are  {x,  y)  and  referred  to  G  are 
(f,  7/).     Also  the  point  G  itself  referred  to  0  is  (x,  y).     Thus 

i=X—X,  •n  =  y—y^ 

and  np  becomes 

(x,-x)(y^-y)-\-  .  .  .  -^(Xn-x){yn-y) 

={^iyi-xyi-yxi+xy)+  .  .  .  +{Xr,yn-^yn-y^n+^y) 

=  K2/i+  .  .  .  +^n2/«)-^(2/i+   •  •  •  +yn)-y(Xi+  •  •  •  +^n)+nxy 

=  (^i2/i-|-  .  .  .  -\-x^7j^)-x.ny-y  .nx+nxy 

=Z(xy)~nxy, 

where  S(xy)  denotes  the  sum  of  expressions  of  the  type  xy. 
Hence  the  corrected  value  of  p 

=(^(xy)ln)-xy,  i  >. 
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We  proceed  to  a  few  applications  of  these  results  in  the  next 
chapter. 

[As  early  as  1846  a  French  physicist,  Auguste  Bravais,  had  conceived  the 
surface  of  error  as  a  means  of  describing  in  space  the  path  of  a  point  whose  x 
and  y  co-ordinates  are  subject  to  errors  which  are  not  independent.  It  is 
astonishing  that  although  his  work  really  embraces  the  fundamentals  of  the 
theory  of  correlation  as  afterwards  developed,  it  lay  dormant  for  nearly  forty 
years  until  Sir  Francis  Galton  introduced  on  graphical  lines  an  improved  nota- 
tion (Galton's  function,  or  the  coefficient  of  correlation)  and  gave  practical 
examples  of  its  use. 

A  little  later  Edgeworth  (1892),  using  Galton's  function,  independently 
reached  some  of  Bravais'  results  for  the  correlation  of  three  variables,  and 
showed  how  they  could  be  extended.  Karl  Pearson,  in  1896,  contributed 
to  the  Royal  Society  Transactions  a  fundamental  paper  on  the  subject,  with 
special  reference  to  the  problem  of  heredity,  drawing  attention  to  the  best 
value  of  the  correlation  coefficient,  and  how  it  should  be  calculated.  (See 
Appendix,  Note  11.)  Yule,  returning  in  the  following  year  to  Bravais'  for- 
mulae, showed  their  significance  also  in  the  case  of  skew  correlation. 

Pearson  afterwards  developed  a  method  of  determining  the  correlation  of 
characters  not  quantitatively  measurable,  and  in  a  discussion  of  the  general 
th3ory  of  skew  correlation  in  another  paper  he  proposed  a  new  function,  the 
correlation  ratio,  applicable  to  the  case  of  non-linear  regression.] 
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CORRELATION — EXAMPLES 


Example  (1). — To  find  the  correlation  between  Differences  in  Whole- 
sale Price  Index  Numbers  and  in  the  Marriage  Rate  from  their  corre- 
sponding Nine-yearly  Averages  during  the  twenty  years,  1889-1908. 
using  the  data  given  on  p.  77. 

Table  (26).  Correlation  between  Differences  in  Wholesale 
Prices  and  Marriage  Rate  from  their  respective  Nine- 
yearly  Averages. 

(2)  (3)  (4)  (5) 


(1) 


(6) 


Year. 

Difference  in 

Prices  from 

9-yearly  Average, 

Square  of 
No.  in 

Col.  (2). 

Difference  in 

Marriage-rate 

from  9-yearly 

Average. 

Square  of 
No.  in 
Col.  (4). 

Product  of  No8. 

in  Col.  (2)  and 

Col.  (4). 

{X) 

(x^) 

(y) 

{y') 

1889 

+  0-9 

0-81 

+  1 

1 

+     0-9 

1890 

+  2-3 

5-29 

+  6 

36 

+  13-8 

1891 

+  7-0 

49-00 

+  6 

36 

+  42-0 

1892 

+  2-4 

5-76 

+  3 

9 

+     7-2 

1893 

+  2-0 

4-00 

-  6 

36 

-12-0 

1894 

-  2-8 

7-84 

-  5 

25 

+   14-0 

1895 

-  4-3 

18-49 

-  6 

36 

+  25-8 

1896 

-  61 

37-21 

+  1 

1 

-  6-1 

1897 

-  3-7 

13-69 

+  3 

9 

-11-1 

1898 

-  0-2 

0-04 

+  4 

16 

-  0-8 

1899 

-   1-6 

2-56 

+  6 

36 

-  9-6 

1900 

'-\-  5-3 

28-09 

+  1 

1 

+     5-3 

1901 

+   10 

100 

. . 

. , 

. . 

1902 

-   0-5 

0-25 

+  1 

1 

-  0-5 

1903 

-   1-4 

1-96 

-  1 

1 

+     1-4 

1904 

-   1-3 

1-69 

-  3 

9 

+     3-9 

1905 

-  2-4 

5-76 

-  2 

4 

+     4-8 

1906 

-  0-5 

0-25 

+  3 

9 

-   1-5 

1907 

+  3-2 

10-24 

+  6 

36 

+  19-2 

1908 

-   1-8 

3-24 

-  2 

4 

+     3-6 

+241-26-6 

197-17 

+41-25 

306 

+  141-9-41-6 

136 
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The  arithmetic  is  comparatively  simple  in  this  case  because 
there  is  only  one  value  of  each  variable  corresponding  to  each  year, 
so  that  there  is  no  weighting  or  grouping  to  complicate  the  analysis. 
The  variables  x  and  y,  between  which  we  wish  to  find  the  correlation, 
appear  in  col.  (2)  and  col.  (4)  in  Table  (26),  and  the  positive  and 
negative  differences  are  separated  from  one  another  in  each  case 
so  as  to  make  their  summation  easier. 

Thus  for  the  arithmetic  mean  of  the  numbers  in  col.  (2),  we  have 

^=(+24-l-26-6)/20=-0'125  ; 

and  for  the  mean  of  the  numbers  in  col.  (4),  we  have 

j^=(+41-25)/20=+0-8. 

The  straightforward  procedure  would  now  be  to  get  the  twenty 
corresponding  values  of  ^  and  v,  the  deviations  of  the  twenty  aj's 
in  col.  (2)  and  of  the  twenty  y's  in  col.  (4)  from  x  and  y  respectively, 
and,  having  found  0-3.  and  ay,  we  could  immediately  deduce  r  from 
the  formula 

r~pla,^(Ty 

But  it  is  simpler  to  measure  the  deviations  from  (0,  0)  as  origin 
rather  than  from  the  mean  (—0-125,  +0-8),  because  x^,  y^,  and  xy 
involve  fewer  significant  figures  than  would  ^^  i/2^  ^nd  ^^/,  and, 
of  course,  it  will  be  necessary  to  correct  for  this  at  the  end  in  the 
usual  way. 

The  mean  square  deviation  of  x  referred  to  zero  as  origin 
=  197-17/20,  by  col.  (3). 

Therefore,  cr^^^  197-17/20- (0-125)2=9-843 

(7,-314. 

Again,  the  mean  square  deviation  of  y  referred  to  zero  as  origin 
=306/20,  by  col.  (5). 
Therefore,  (j/=306/20-  (0-8)2=  14.66 

c7^=3-83. 
Also  the  corrected  p 

=  {Exy)ln-xy 

=  100-3/20- (-0-125)(+0-8),  by  col.  (6) 

=5-015+0-100 

=5115. 

Hence  'f=vl<^x'^y 

=5-115/(3-14)(3-83) 
=0-43. 
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It  is  necessary  to  be  careful  with  the  signs  in  forming  the  numbers 
in  col.  (6),  but  otherwise  the  actual  calculation  should  present  no 
difficulty. 

The  regression  equation  giving  the  best  marriage  rate  difference, 
Y,  for  a  given  wholesale  price  difference,  X,  from  their  respective 
nine-yearly  averages  is 

{Y-0'8)=r^  .  (X+0-125) 

=  (043)||g-j(X+0.i25) 

i.e.  Y=:0-52X+0-86. 

The  regression  equation  giving  the  best  wholesale  price  difference, 
X,  for  a  given  marriage  rate  difference,  Y,  from  their  respective 
nine-yearly  averages  is 

(X+0-125)=r^  .  (Y-0-8) 

=0-35(Y-0-8) 
i.e.  X=0-35Y-0-40. 

We  noted  that  fig.  (10),  p.  80,  suggested  a  closer  correlation 
between  the  two  factors  we  have  been  considering  during  the 
earlier  years  of  the  period  1875-1908  than  during  the  later  years. 
It  might  be  worth  while  as  an  exercise  to  see  if  this  is  borne  out 
by  calculating  r  for  the  years  1875-1889,  and  comparing  it  with 
the  value  found  for  the  years  1889-1908. 

Example  (2). — To  find  the  correlation  between  Overcrowding  and 
Infant  Mortality  in  London  Districts.  [Data  taken  from  London 
Statistics,  vol.  23,  published  by  the  London  County  Council.] 

The  figures  are  apparently  based  upon  the  Census  Report  of 
1911.  The  numbers  in  col.  (2),  Table  (27),  show  what  percentage  of 
the  total  population  occupying  private  houses  in  each  district  were 
living  in  overcrowded  conditions,  any  ordinary  tenement  which 
has  more  than  two  occupants  to  a  room,  including  bedrooms  and 
sitting-rooms,  being  defined  as  overcrowded.  The  numbers  in 
col.  (5)  show  the  infantile  mortality  in  each  district,  that  is,  the 
number  of  infants  who  died  under  one  year  out  of  every  1000 
born,  including  both  sexes. 

For  the  sake  of  comparison  these  numbers  have  been  plotted 
together  on  the  same  graph  sheet.  The  districts,  arranged  in 
alphabetical  order,  were  numbered  from  1  to  29  so  as  to  form  a  hori- 
zontal scale  corresponding  to  the  scale  of  years  in  discussing  prices 
and  marriages.     The  scale  in  this  case  is,  of  course,  purely  artificial, 
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and  the  only  reason  for  joining  up  neighbouring  points  is  that  we  are 
better  able  by  so  doing  to  see  whether  or  not  high  values  of  the  one 
variable  go  with  high  values  of  the  other  variable,  and  low  with  low. 
In  calculating  the  mean  and  standard  deviation  for  overcrowding 
we  have  measured  deviations  from  17-0  as  origin,  and  in  making  the 
same  calculations  for  infant  mortality  we  have  measured  devia- 
tions from  125  as  origin.  It  is  convenient,  therefore,  to  use  the 
point  (17-0,  125)  as  origin  in  working  out  also  the  product  deviation 
sum,  col.  (8)  of  Table  (27),  instead  of  using  the  mean  (17-86,  126). 


Table  (27).  Correlation  between  Overcrowding  and 
Infant  Mortality  in  London  Districts  (1911). 

(1)  (2)  '   (3)  ■     (4)  (5)  (6)  (7)  (8) 


Per- 

centage 
of 

Deviation  of   | 

Square 

Infant 

Deviation  of 

^square 

of  No. 

in 

Product  of  Nos. 

District. 

Popula- 

No. 

in  Col.  (2) 

of  No.  in 

Mor-    ] 

^To.  in  Col.  (5) 

in  Col.  (3)  and 

tion 
Over- 

from 17-0.      1 

Col.  (3). 

tality. 

from  125. 

2o\.(6). 

Col.  (6). 

crowded 

{x) 

(y) 

1)  Battersea . 

13-3 

-  3-7 

13-69 

124 

-      1 

1 

+       3-7 

(2)  Bermondsey     . 

23-4 

+ 

6-4 

40-96 

156 

+  31 

961 

+  198-4 

(3)  Bethnal  Green . 

33-2 

+ 

16-2 

2G2-44 

151 

+  26 

676 

+  421-2 

(4)  Camberwell      . 

13-5 

-   3-5 

12-25 

109 

-  16 

256 

+     56-0 

(5)  Chelsea    .•      . 

14-9 

-  2-1 

4-41 

109 

-  16 

256 

+     33-0 

(6)  City  of  London 

12-3 

-  4-7 

22-09 

124 

-     1 

1 

+       4-7 

(7)  Deptford  . 

12-2 

-  4-8 

23-04 

142 

+  17 

289 

-  81-6 

(8)  Finsbury  . 

39-8 

+ 

22-8 

519-84 

156 

+  31 

961 

+  706-8 

(9)Fulham     . 

14-6 

-  2-4 

5-76 

125 

(10)  Greenwich 

124 

-  4-9 

24-01 

128 

+     3" 

"  "9 

■■_  14.7 

(11)  Hackney  . 

12-4 

-  4-6 

21-16 

119 

-     6 

36 

+     27-6 

(12)  Hammersmith. 

14-2 

-  2-8 

7-84 

146 

+  21 

441 

-  58-8 

(13)  Hampstead       . 

71 

-  9-9 

98-01 

78 

-  47 

2209 

+  465-3 

(14)Holborn    . 

25-6 

+ 

8-6 

73-96 

115 

-  10 

100 

-  86-0 

(15)  Islington  . 

20-0 

+ 

3-0 

9-00 

127 

+     2 

4 

+      6-0 

(16)  Kensington 

17-1 

+ 

0-1 

0-01 

133 

+     8 

64 

+      0-8 

(17)  Lambeth  . 

13-6 

-  .3-4 

11-56 

123 

-     2 

4 

+      6-8 

(18)  Lewisham. 

3-9 

-13-1 

171-61 

104 

-   21 

441 

+  275-1 

(19)  Paddington 

16-2 

-  0-8 

0-64 

127 

+     2 

4 

-     1-6 

(20)  Poplar      . 

20-6 

+ 

3-6 

12-96 

157 

+  32 

1024 

+  115-2 

(21)  St.  Marylebone 

20-7 

+ 

3-7 

13-69 

108 

-  17 

289 

-  62-9 

(22)  St.  Pancras       . 

25-5 

+ 

8-5 

72-25 

112 

-  13 

169 

-110-5 

(23)  Shoreditch 

36-6 

+ 

19-6 

384-16 

170 

+  45 

2025 

+  882-0 

24)  Southwark 

25-8 

+ 

8-8 

77-44 

144 

+  19 

361 

+  167-2 

(25)  Stepney 

35-0 

+ 

18-0 

324-00 

144 

+  19 

361 

+  342-0 

(26)  Stoke Newington 

8-8 

-  8-2 

67-24 

102 

-  23 

529 

+  188.6 

(27)  Wandsworth    . 

6-3 

-10-7 

114-49 

122 

-     3 

9 

+     32.1 

(28)  Westminster    . 

12-9 

-  4-1 

16.81 

103 

-  22 

484 

+     90-2 

(29)  Woolwich. 

6.3 

-10.7 

114-49 

97 

-  28 

784 

+  299-6 

+  119-3-94-4 

2519-81 

+  2.56-226 

12748 

+  4322-9-416-1 
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For  overcrowding, 

mean=  17+24-9/29=:  17-86  ; 

G,=  V[(2519-81/29)-  (0-86)2]=  V(86-15)=9-3. 
For  infant  mortality, 

mean=  125+30/29=  126-03  ; 

(T,=  V'[(12748/29)-  (1-03)2]=  V438:5=20-9. 

Also^,  referred  to  (17-0,  125)=(4322-9-416-l)/29=3907/29,  and, 
referred  to  the  mean  (17-86,  126-03),  tliis  becomes 
=3907/29-(0-86)(l-03) 
=  133-8. 
Hence  r=133-8/(9-3)(20-9)=0-69, 

so  that  the  correlation  between  overcrowding  and  infant  mortality 
is  fairly  marked. 


§•0 


5  P20 


^o 


%4 


.       .      _      ,      _      „      .       _      „     10    11     12   13    14-    15    16    17    18    19   20   21    22   23  24-  23   26    27   28   29 

Numbers  representing  various  London  Districts 
Fig.  (24). 

The  regression  equation  giving  the  average  infant  mortality,  Y, 
for  districts  in  which  the  extent  of  overcrowding,  X,  is  known  is 

Y-  126-03=r^^(X- 17-86) 

^    ^(0-69)(20.9) 

9-3        ^  ^ 

i.e.  Y=l-55X+98-4. 

Similarly,  the  regression  equation  giving  the  average  percentage 
of  overcrowding,  X,  for  districts  with  a  known  amount  of  infant 
mortality,  Y,  is 

X-  17-86=r^^(Y- 126-03) 

=0-31(Y- 126-03) 
*.e,  X=0-31Y-81-0, 
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Example  (3). — The  reader  might  apply  the  same  method  to  the 
determination  of  the  correlation  between  Ratio  of  Indoor  Paupers 
and  Ratio  of  Outdoor  Paupers,  each  measured  per  1000  of  the  esti- 
mated Population  in  England  and  Wales,  excluding  casuals  and 
insane,  during  the  years  1900-1914.  The  following  are  the  statistics 
required  for  the  purpose  : — 


Table  (28).  Correlation  between  Ratio  of  Indoor  and  Ratio 
or  Outdoor  Paupers,  each  measured  per  1000  or  the 
Population. 


Indoor 

Outdoor 

Indoor 

Outdoor 

Year. 

Paupers- 

Paupers- 

Year. 

Paupers—" 

Paupers — 

Rate  per  1000. 

Rate  per  1000. 

Rate  per  1000. 

Rate  per  1000. 

1900 

5-9 

15-8 

1908 

6-8 

16-4 

1901 

5-8 

15-3 

1909 

7-1 

15-6 

1902 

6-0 

15-3 

1910 

7-2 

151 

1903 

6-2 

15-4 

1911 

7-2 

14-1 

1904 

6-3 

15-4 

1912 

6-9 

11-2 

1905 

6-6 

161 

1913 

6-7 

111 

1906 

6-8 

160 

1914 

6-4 

10-4 

1907 

6-8 

15-6 

The  coefficient  of  correlation  in  this  case  comes  out  negative 
and  =  —  -15,  but  it  is  very  small  and  probably  not  significant. 
If  it  were,  it  would  imply  that  as  indoor  pauperism  diminishes 
outdoor  pauperism  increases,  and  vice  versa. 

Example  (4). — To  find  the  correlation  between  the  Number  of 
Cattle  and  the  Number  of  Acres  of  Permanent  Grass-land  in  the  Coal- 
Producing  Counties  of  England  (1915). 

A  Government  Report  was  consulted  giving  the  acreage  under 
crops  and  grass  and  the  number  of  live  stock  in  each  petty  sessional 
division  in  the  country,  as  returned  on  4th  June  1915,  and  the 
counties  included  were  those  which  appear  in  the  coal-mining 
reports  published  monthly  in  the  Labour  Gazette. 

In  each  county  the  petty  sessional  divisions  with  the  greatest 
and  the  least  numbers  of  cattle  and  of  acres  of  grass-land  were 
noted,  the  numbers  being  written  down  to  the  nearest  1000,  and, 
after  a  rough  examination  of  the  range  of  these  variables  from 
county  to  county,  suitable  class  intervals  were  chosen  and  a  table 
of  double  eutry  was  drawTi  up,  Table  (29),  with  an  empty  square 
ready  for  each  possible  pair  of  variables. 
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Table  (29).  Correlation  between  the  Number  of  Cattle 
AND  THE  Number  of  Acres  of  Permanent  Grass-land  in 
THE  Coal-Producing  Counties  of  England  (1915). 


Total  Head  of  Cattle  (expressed  to  nearest  thousand) 

^1 
0-5 

^2 
5-10 

^3 
10-15 

15-20 

20-25 

25-30 

30-35 

Xp 
35-40 

Totals 

Meanx 

o 

:S 

i 

s: 

1 

£ 
1 

1 

1 

i 

»2 

^1 

0-5 

:     lo 
1 :  15 
i :  150 

15 

2-50 

5-10 

:  :     9>. 
W  :27 
:  :  :2i6 

4 
3 

:     " 

30 

300 

^3 
10-15 

::  :  6 

jjjso 

:  :  :i8o 

•:    3 
::   18 
::   54 

48 

4-37 

15-20 

4 

3 

:     12 

III- 

:::6o 

33 

704 

20-25 

0 

i     ^ 

;      0 

30 

8-33 

25-30 

0 

1 
0 

:       0 

I:    ^* 
: :     0 

.       0 

:      9 
:      0 

0 

2 

J      0 

26 

9-81 

30-35 

-t 
:       6 
:     -6 

:  :   0 
:  j  22 
:::o 

z 

3 

•      3 

31 

1202 

35-40 

-2 

1 

:      0 

i    12 

: :    0 

2 
:     6 
:    " 

4 
.     4 

:    16 

23 

15-33 

40-45 

0 

3 
:      0 

3 
.      4 
•     12 

9 

1 

.      9 

8 

16-87 

45-50 

0 

3 

:     0 

4 

3 

:    12 

8 
3 

:    M 

16 
.    16 

10 

19-00 

50-55 

s 

:    * 
:    20 

10 

.      4 

:    40 

15 

1 
.    »5 

9 

20-83 

55-60 

6 

1 

.      6 

12 

2 

:    24 

24 

1 

.     24 

30 

1 

.    30 

5 

26-50 

60-65 

21 

1 

.       21 

1 

27-50 

65-70 

24 

1 
.       24 

1 

27-50 

70-75 

27 

1 
.       27 

1 

27-50 

75-80 

40 
3 

;       120 

3 

325 

80-85 

n 

1 

.    zx 

22 

1 

2 

200 

' 

Totals 

76 

97 

54 

24 

14 

5 

5 

1 

276 

Mean  y 

9  14 

20-13 

33-24 

43-33 

5000 

59  50 

67-50 

57-5 
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Each  petty  sessional  division  was  then  considered  in  turn  and  a 
dot  was  inserted  in  the  particular  square  applicable  to  it :  e.g.  a 
petty  sessional  division  with  42,000  acres  of  grass-land  and  feeding 
19,000  cattle  would  be  represented  by  a  dot  in  the  square  defined 
by  row  (40-45)  and  col.  (15-20)  in  Table  (29) ;  x  was  used  to  repre- 
sent the  number  of  cattle  and  y  the  number  of  acres  of  grass-land 
in  any  division,  each  expressed  to  the  nearest  1000  units.  All  the 
dots  were  ultimately  added  in  each  square  giving  the  frequency 
for  each  corresponding  pair  of  variables,  and  these  frequencies  were 
recorded  in  the  centres  of  the  squares  to  which  they  applied :  e.g. 
the  frequency  of  petty  sessional  divisions  stocking  10  to  15  thousand 
cattle  and  with  30  to  35  thousand  acres  under  permanent  grass 
was  22.  The  total  frequency  for  each  row,  i.e.  each  array  of 
selected  y  ty^e,  was  also  noted,  in  the  column  at  the  end  of  the 
rows  :  e.g.  altogether  31  petty  sessional  divisions  were  observed  of 
the  type  having  30  to  35  thousand  acres  of  land  under  permanent 
grass.  Likewise  the  total  frequency  for  each  column,  i.e.  each 
array  of  selected  x  tjrpe,  was  noted  in  the  row  at  the  foot  of  the 
columns  :  e.g.  altogether  54  divisions  were  observed  of  the  type 
stocking  10  to  15  thousand  head  of  cattle. 

It  was  possible  now  to  treat  each  column  separately  and  to 
calculate  the  mean  y^s  associated  with  different  types  of  x,  namely 
^i>  ^2j  ^3j  •  •  •  >  ^'id  the  frequencies  so  obtained  were  inserted  in 
the  bottom  row  of  Table  (29) :  e.g.  when  x  lies  between  20  and  25 
thousand,  the  mean  value  of  y  is  50  thousand.  The  resulting 
points— (a?!,  y^,  (x^,  y^,  (x^,  Vs)  -  -  ■  in  the  notation  of  Chapter  x. — 
are  plotted  together  in  fig.  (25),  and  they  are  seen  to  lie  approxi- 
mately in  a  straight  line.  The  successive  rows  were  treated  in 
precisely  the  same  way  and  the  mean  cc's  calculated  corresponding 
to  2/'s  of  different  types,  namely  y^,  y^,  2/3,  •  •  •  ?  the  frequencies 
obtained  being  recorded  in  the  extreme  right-hand  column  of 
Table  (29)  :  e.g.  when  y  lies  between  45  and  50  thousand,  the  mean 
value  of  X  is  19  thousand.  The  resulting  points  (x^,  y^),  (rcg,  2/2)? 
{Xq,  2/3),  •  •  •  ,  are  also  plotted  in  fig.  (25),  and,  excepting  for  values 
which  depend  upon  only  one  or  two  records,  they  too  lie  roughly 
in  a  straight  line  which  is  not  far  from  coinciding  with  the  previous 
one,  so  that  we  shall  expect  on  calculation  to  get  a  high  value  for 
the  coefficient  of  correlation. 

In  order  to  calculate  r  we  need  first  to  find  the  mean  and  standard 
deviation  for  each  variable.  For  this  let  us  take  as  origin  the 
point  (12-5,  27-5).  The  essential  details  are  shown  immediately 
below  the  relative  Tables  (30)  and  (31). 
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Table  (30).    Distbibution  of  Petty   Sessional  DmsioNS  ac- 

COEDING    TO    THE    HeAD    OF    CaTTLE    (EXPRESSED    TO    NEAREST 
1000)    STOCKED. 

(1)  (2)  (3)  (4)  (6) 


No.  of  Cattle 

Devia- 

No. of  Petty 

Product  of 

Product  of 

stocked  (in 

tion  from 

Sessional 

Nos.  in 

No9.  in 

thousands). 

12-5. 

Divisions. 

Cols.  (2)  &  (3). 

Cols.  (2)  &  (4). 

(x) 

i 

(-•■ 

(.n- 

0-5 

-2 

76 

-152 

304 

5-10 

-1 

97 

-  97 

97 

10-15 

0 

54 

. . 

.. 

15-20 

+  1 

24 

+  24 

24 

20-25 

+  2 

14 

+  28 

56 

25-30 

+3 

5 

+  15 

45 

30-35 

+4 

5 

+  20 

80 

35-40 

+5 

1 

+     5 

25 

276 

-157 

631 

27  6X5: 


Mean  number  of  cattle=12-5 
units   referred    to    12-5     as     origin ; 


=9-66,  since  x=—^ji  class 

___  and  a,=5V[in-ann 

=  5Vl-963=7-00. 

[The  numbers  in  col.  (4)  may  be  spoken  of  as  the  first  moments 
of  the  totals  of  x  arrays  and  the  numbers  in  col.  (5)  as  the  second 
moments.] 

In  order  to  calculate  easily  the  product  deviation  with  reference 
to  (12-5,  27-5)  as  origin,  the  value  proper  to  each  square  was  inserted 
just  above  the  frequency  and  the  product  of  the  deviation  by  the 
frequency  was  inserted  just  below  the  frequency  in  different  type  of 
print  to  prevent  confusion  :  e.g.  the  row  (50-55)  is  +5  class  intervals 
distant  from  the  row  (25-30)  containing  the  origin,  and  the  column 
(20-25)  is  +2  class  intervals  distant  from  the  column  (10-15)  con- 
taining the  origin  ;  hence,  for  the  particular  square  defined  by  this 
row  and  this  column,  the  product  deviation=5x2=10 ;  also 
the  frequency  recorded  in  this  square =4,  so  that  it  supplies  a 
term  10  X  4  to  the  product  deviation  ;  the  numbers  10,  4,  and  40 
are  therefore  the  numbers  which  appear  in  the  square.  It  is  neces- 
sary to  be  careful  with  the  signs  ;  if  the  product  deviation  is  to 
be  positive,  the  separate  deviations  must  be  of  like  sign,  both 
positive  or  both  negative  :  hence  they  must  either  be  both  above 
or  both  below  the  numbers  12-5  and  27-5  respectively  from  which 
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they  are  measured.     In  this  instance  there  are  only  two  negative 
terms  among  the  product  deviations  in  the  whole  table. 

Table  (31).    Distribution  of  Petty  Sessional  Divisions  ac- 
cording TO  the  Number  of  Acres  of  Land  (expressed  to 

NEAREST   1000)   UNDER  PERMANENT   GrASS. 


(1) 

(2) 

(3) 

(4) 

(5) 

No.  of  Acres 

under  Grass 

(in  thousands). 

Deviation 
from  27-5. 

No.  of  Petty 
Sessional 
Divisions. 

Product  of 

Nos.  in 

Cols.  (2)  &  (3). 

Product  of 

Nos.  in 

Cols.  (2)  &  (4). 

0-  5 

iy) 

-  5 

15 

-  75 

375 

5-10 

-  4 

30 

-120 

480 

10-15 

-  3 

48 

-144 

432 

15-20 

-  2 

33 

-  66 

132 

20-25 

-   1 

30 

-  30 

30 

25-30 

. . 

26 

. , 

. . 

30-35 

+   1 

31 

+  31 

31 

35-40 

+  2 

23 

+  46 

92 

40-45 

+  3 

8 

+  24 

72 

45-50 

+  4 

10 

+  40 

160 

50-55 

+  5 

9 

+  45 

225 

55-60 

+  6 

5 

+  30 

180        • 

60-65 

+  7 

1 

+     7 

49 

65-70 

+  8 

1 

+     8 

64 

70-75 

+  9 

1 

+     9 

81 

75-80 

+  10 

3 

+  30 

300 

80-85 

+  11 

2 

+  22 

242 

276 

-143 

2945 

Mean   number    of  acres =27-5 - 


iifx 5-24-91,    since   y=-\^l 


class  units  ;   and  a^=5V[W/-(4Tf)^]=5VlO-402=  16-12. 

[The  numbers  in  col.  (4)  are  the  first  moments  of  the  totals  of  y 
arrays,  and  the  numbers  in  col.  (5)  are  the  second  moments.'] 

It  is  now  a  simple  matter  to  sum  the  product  deviation  terms, 
taking  each  column  (or  each  row)  in  turn  :  e.g.  the  first  column 
gives 

150+216+180+12=558; 

the  second  column  gives 

12+54+60+25-6-2-143, 
and  so  on  ;   and,  summing  these  results  together,  we  get 
558+143+76+126+96+160+30=1189. 
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But  this  is  the  sum  of  all  the  product  deviations  referred  to 
(12-5,  27-5)  as  origin.     Transferring  now  to  the  mean,  we  have 

=¥A'-(-ifi)(-ifJ) 

=4-013,  expressed  in  class  units. 
Hence,  ^=vl(^x^y^ 

where  u^  and  Oy  are  also  to  be  expressed  in  class  units, 

=4-013/V(l'963)V(10-402) 

=0-89, 

a  result  not  far  from  unity,  so  that  the  correlation  is  high. 

The  regression  of  '  acreage  of  grassland  '  (Y)  on  '  head  of  cattle  ' 
(X)  is  given  by 

(Y-24-91)=r^(X-9-66) 

=  (0-89)^i5^(X-9-66), 
(7-00)  ' 

i.e.  Y=205X+5-ll. 

The  points  representing  the  mean  2/'s  for  a;'s  of  different  types 
should  lie  close  to  this  line  which  is  shown  in  fig.  (25).  This  equation 
enables  us  to  predict  the  acreage  under  permanent  grass  to  be 
found  on  the  average  in  petty  sessional  divisions  with  a  given  total 
head  of  cattle  in  each.  The  words  '  on  the  average,'  to  be  tacitly 
understood  even  if  not  stated  in  all  such  cases,  are  emphasised 
because  the  prediction  relates  to  the  whole  array  of  divisions  of  a 
particular  type,  and  as  it  only  professes  to  give  the  mean  or  most 
likely  result  it  is  not  to  be  pronounced  worthless  if  it  fails  in  an 
individual  trial  with  a  selected  division. 

Again,  the  regression  of  X  on  Y  is  given  by 

(X-9-66)=r^-^(Y-24-91) 

(Jy 

i.e..  X=0-39Y+005, 

which  tells  us  the  total  head  of  cattle  (X)  to  be  found  on  the  average 
in  petty  sessional  divisions  when  the  acreage  under  permanent 
grass  (Y)  is  known.     This  line  is  also  drawn  in  fig.  (25). 

Example  (5). — The  data  for  this  example  are  taken  from  an 
exceedingly  interesting  Government  Report  on  the  Cost  of  Living 
of  the  Working  Classes  {Report  of  an  Inquiry  by  the  Board  of  Trade 
into  Working  Class  Reyits  and  Retail  Prices,  together  with  the  Rates 
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of  Wages  in  certain  Occupations  in  Industrial  Towns  of  the  United 
Kingdom  in  1912  in  continuation  of  a  similar  Inquiry  in  1905, 


70 


60 


50 


40 


30 


'20 


10 


i 


m. 


0  10  20  30  40         X 

Total  Head  of  Cattle  (expressed  to  nearest  thousand) 
Fig.  (25). 


Cd.  6955).     Some  further  particulars  concerning  this  Report  will 
be  found  on  p.  281. 
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The  towns  included  in  the  inquiry  numbered  93,  but  in  five 
instances  it  was  found  desirable  to  consider  closely  adjacent  muni- 
cipalities as  single  towns  thus  reducing  the  number  of  town-units 
to  88,  namely  72  in  England,  10  in  Scotland,  and  6  in  Ireland.  In 
the  example  which  foUows  the  three  zones  of  London,  middle, 
inner,  and  outer,  have  been  treated  as  separate  towns,  so  making 
the  net  number  of  town-units  90.  This  number  is  too  small  to 
allow  any  real  value  to  be  attached  to  our  results,  but  the  fewness 
of  the  observations  makes  them  easier  to  deal  with  as  an  illustration 
of  method. 

We  begin  as  before  by  choosing  ^convenient  class  intervals  for 
the  two  factors  we  propose  to  consider,  namely,  Increment  of  Un- 
skilled Wages  and  Increment  of  Bents — by  increment  in  each  case 
is  meant  the  percentage  increase  (+)  or  decrease  (— )  between 
1905  and  1912 — and  then  form  a  correlation  table.  In  the  last 
example  separate  tables  were  drawn  up  to  find  means  and  S.D.'s, 
but  that  was  only  done  in  order  to  keep  the  argument  clear  at  its 
first  presentment :  generally  we  may  dispense  with  these  additional 
tables  and  show  all  the  worldng  in  one  (see  Table  (32)). 

The  increment  of  wages  runs  from  (—2-5)  per  cent,  to  (+11-5) 
per  cent.,  so  that,  if  we  take  (—0-5)  as  origin  and  a  difference  of 
2  per  cent,  as  unit,  the  classes  run  from  (—1)  to  (+6),  these  numbers 
being  shown  in  different  type  in  the  table,  but  in  the  same  com- 
partments as  the  others.  In  the  fourth  row  from  the  bottom 
are  shown  the  total  frequencies  for  x  arrays  from  class  (—1)  to 
class  (+6),  and  in  the  row  just  below  it  these  several  frequencies 
are  shown  multiplied  by  their  corresponding  deviations  measured 
from  (—0-5)  as  origin  in  terms  of  the  class  unit — the  resulting 
numbers  give  the  first  moments  of  the  totals  of  x  arrays.  These 
numbers,  multiplied  again  by  their  corresponding  deviations,  give 
the  second  moments  of  the  totals  of  x  arrays,  and  appear  in  the 
last  row  but  one  of  the  table. 

We  deal  in  exactly  the  same  way  with  increment  of  rents  :  a 
percentage  increment  of  (—1)  is  taken  as  origin  from  which  devia- 
tions are  measured,  a  difference  of  3  per  cent,  is  taken  as  unit, 
and  the  different  classes  then  have  deviations  running  from  (—3) 
to  (+6).  The  totals  of  y  arrays,  the  first  moments,  and  the 
second  moments  of  these  totals  appear  in  the  last  three  columns 
on  the  right-hand  side  of  Table  (32). 

To  calculate  the  deviation  products,  numbers  were  inserted  in 
each  square  on  the  same  principle  as  in  the  last  example,  and  the 
sums  of  these  products  for  each  x  array,  that  is  for  each  column, 
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are  given  in  the  bottom  row  of  the  table— 1,  0,  14,  6,  etc.,  making 
in  all  a  total  of  126. 


Table  (32).  Correlation  between  Increment  of  Unskilled 
Wages  and  Increment  of  Rents  in  certain  Industrial 
Towns  of  the  United  Kingdom. 
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The  necessary  calculations  are  as  follows  : — 

1.  Mean  a:=-0-5+2(125)/90=2'28, 

(7^=2V[-VV--(W)']=2V(26585)/90. 

2.  Mean  2/=-l  +  3(75)/90-:l-50, 

^t/^SVftV— (U)2]=3V(21825)/90. 


^ 120        /12  6.\/7  5\ 


1965 
(90)2' 


expressed  in  class  units» 
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Hence 


r=pl(j^<iy 
1965 


(90)2 
:0-08. 


X 


90 


X 


90 


V(26585)     V(21825) 


In  substituting  for  Gg.  and  cry  to  find  r  we  have  omitted  the  factors 
2  and  3  respectively,  because  the  S.D.'s  have  to  be  expressed  in 
the  same  units  as  p.  Alternatively,  if  we  worked  with  a  difference 
of  1  per  cent,  as  unit,  instead  of  taking  a  difference  of  2  per  cent, 
as  xmit  for  x  deviations,  and  a  difference  of  3  per  cent,  as  unit  for 
y  deviations,  each  individual  product  of  x  and  y  deviations  would 
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Percentage  Increment  of  Wages 
Fig.  (26). 

have  to  be  multiplied  by  2  x  3.  Thus  p  would  then  be  6  X  1965/(90)2, 
and  we  should  get  the  same  result  for  r  as  before  by  taking  g^ 
and  Gy  as  in  (1)  and  (2)  above.  In  this  case  r  is  so  small  as  to  be 
quite  insignificant  of  any  correlation  between  the  two  factors  dis- 
cussed, and  the  regression  lines  should  therefore  be  not  far  from 
perpendicular  to  one  another. 

The  regression  of  y  on  x,  or  the  equation  giving  the  most  probable 
y  for  a  given  type  a;  is 

(2/-l-50)=r^(a;-2-28), 


I.e. 


y=0'Ux+l'26. 
I 
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Similarly,  the  regression  oi  x  on  y  is 

x=0'06y+2'2. 
To  draw  the  first  line  we  note  that  it  passes  through  the  points 
(0,  1*25)  and  (5,  1*8) ;   also  the  second  line  goes  through  the  points 
(2-2,  0)  and  (2-5,  5).     The  two  lines  intersect  at  M  (2-28,  1-5),  the 
mean  of  the  distribution.     They  are  drawn  together  in  fig.  (26). 


Table  (33).  Correlation  between  Unskilled  Wages 
AND  Rents  in  certain  Industrial  Towns  of  the 
United  Kingdom. 


X  =  Index  Number  for  Wages  of  Unskilled  Labour 
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Example  (6). — Instead  of  discussing  the  Changes  in  Wages  and 
Rents  between  1905  and  1912,  it  might  be  of  interest  to  find  the 
correlation  between  index  numbers  representing  Actual  Wages  and 
Rents  in  October  1912,  taken  from  the  same  Report.  The  necessary 
data  for  this  purpose  appear  in  Table  (33)  showing  the  distribution 
of  frequency  between  the  different  classes  :  e.g.  seven  towns  were 
observed  in  which  the  index  number  for  wages  was  between  the 
limits  (79-84)  and  the  index  number  for  rents  was  between  the 
limits  (53-60).  The  wages  figures  quoted  in  Table  (33)  refer  only 
to  unskilled  labour  in  the  building  trade  ;  the  inquiry  actually 
embraced  certain  occupations  in  the  building,  engineering,   and 
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printing  trades,  these  having  been  selected  as  industries  which  are 
found  in  most  industrial  towns,  and  in  which  the  time  rates  of 
wages  are  largely  standardised. 

Table  (34).  Correlation  between  Increment  of  Working 
Class  Prices  and  Increment  of  Working  Class  Rents 
IN  certain  Industrial  Towns  of  the  United  Kingdom. 


X  =  Percentage  Increment  of  Prices 
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The  coefficient  of  correlation  turns  out  to  be  0-46,  distinctly  larger 
than  in  the  previous  case.     Also  the  lines  of  regression  are  : — 
(1)  y=0'4nx-\-2l.  (2)  a;=0-452/+56. 

Example  (7). — The  Report  also  furnishes  data  for  evaluating  the 
correlation  between  the  Increment  of  Working  Class  Prices  and 
Increment  of  Working  Class  Rents,  again  meaning  by  increment  the 
percentage  increase  (+)  or  decrease  (— )  between  1905  and  1912 
(see  Table  (34)). 

The  correlation  in  this  case  is  very  small,  being  only  0-13.  The 
regression  equations  are  : — 

(1)  y=0-22x-l-5,  (2)  x=0'01y-{-l3. 


PART    II 
CHAPTEK   XII 

INTRODUCTION   TO   PROBABILITY    AND    SAMPLING 

Sfppose  we  wish  to  know  the  average  measurement  of  some  organ 
or  character,  e.g.  length  of  forearm  or  weight  or  anything  similar, 
in  a  large  population  containing  several  thousand  individuals.  The 
mean  obtained  by  actual  measurement  if  it  were  practicable  to 
carry  it  out  on  so  large  a  scale,  would  evidently  depend  to  some 
extent  upon  the  sex,  the  race,  the  age,  the  social  class,  and  so  on, 
of  the  individuals  selected,  and  we  shall  accordingly  assume  our 
population  to  be  composed  of  individuals  of  the  same  race  and  sex, 
at  about  the  same  age,  taken  from  the  same  class,  etc.  ;  it  would  be 
impossible  in  practice  no  doubt  to  secure  that  all  conditions  should 
be  identically  the  same  for  all  the  individuals  observed,  but  the 
population  may  be  as  homogeneous  as  we  care  to  make  it  in  theory. 

Now  suppose*  that,  instead  of  attempting  to  measure  every  single 
individual,  a  random  sample  of  1000  from  among  the  population 
be  taken  and  that  the  mean  and  variabiUty  of  the  measurements 
for  this  sample  be  calculated,  giving  results  m-^  and  a^.  With 
these  may  be  compared  mg  and  g^,  the  results  of  measuriug  a  second 
sample  of  1000  individuals,  m^  and  o-g,  the  results  of  a  third  sample, 
and  so  on.  It  is  extremely  unlikely  that  the  values  obtained  for 
the  m's  in  this  way  will  equal  one  another,  neither  will  the  o-'s 
be  equal ;  but,  if  we  have  succeeded  at  the  beginning  in  avoiding 
aU  411-balanced  influences  when  we  tried  to  make  the  field  of 
observation  as  homogeneous  as  possible,  the  resulting  m's  and  cr's 
will  only  differ  from  the  values  of  the  mean  and  variability  for  the 
whole  population,  assuming  they  could  be  measured,  within  a 
comparatively  small  range. 

Differences  of  this  kind,  which  arise  merely  owing  to  the  fact 
that  we  are  often  obliged  in  practice,  for  lack  of  time  or  means,  to 
deal  with  a  comparatively  small  sample  instead  of  with  the  whole 
population  of  which  it  forms  a  part,  are  said  to  be  due  to  random 
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sampling.  Granted  that  the  samples  themselves  are  adequate  in 
size  (containing,  say,  from  500  to  1000  individuals  each)  an  esti- 
mate of  differences  to  be  expected  between  one  and  another  can  be 
made,  and  unless  the  observed  differences  fall  outside  recognized 
Umits  it  is  said  that  they  are  not  significant  of  any  difference  other 
than  such  as  might  quite  weU  be  accounted  for  by  random  sampling 
alone. 

In  theory,  then,  we  can  imagine  a  large  number  of  such  random 
samples  selected,  and  by  determining  the  S.D.  of  their  means, 
m^,  mg,  mg,  .  .  .  ,  we  should  have  a  fair  measxire  of  the  deviation 
which  might  quite  well  occur  from  the  true  value,  that  is,  from  the 
mean  of  the  population  as  a  whole,  through  working  only  with  a 
sample.  Further,  a  range  of  two  or  three  times  the  S.D.  on  either 
side  of  the  true  mean  ought  to  take  in  the  majority  of  the  sample 
means  observed. 

Exactly  the  same  principle  holds  good  in  dealing  with  the  pro- 
portion of  individuals  in  a  given  population  which  can  be  assigned 
to  a  particular  class,  or  in  discussing  the  S.D.  of  the  distribution,  or 
the  C.  of  v.,  or  a  coefficient  of  correlation,  or  any  other  statistical 
constant,  no  matter  what  the  nature  of  the  character  may  be  which 
is  measured  or  observed,  or  whether  it  relates  to  animate  or  inani- 
mate objects.  Take,  for  instance,  the  variabiUty — by  selecting 
several  samples  from  a  given  population  we  get  a  series  of  values 
a  I,  o-g,  0-3  .  .  .,  and  in  the  S.D.  of  this  distribution  of  variabilities 
we  have  a  measure  to  which  we  can  compare  the  deviation  of  any 
sample  variability,  Gj.,  from  the  true  variabiHty  of  the  whole  popu- 
lation, while  a  range  two  or  three  times  the  S.D.  might  be  expected 
to  include  the  majority  of  the  different  variabilities  met  with  in 
the  samples. 

Although  the  S.D.,  as  we  have  explained,  provides  quite  a  suit- 
able measure  of  the  extent  of  deviation  of  a  sample  constant  from 
its  true  value  in  the  population  as  a  whole,  in  practice,  owing  to 
the  historical  development  of  the  theory  having  followed  the  track 
of  the  normal  curve  of  error  [see  Chapter  xviii.]  a  measure  known 
as  the  probable  error  and  equal  roughly  to  two-thirds  of  the  S.D. 
is  not  seldom  employed  in  its  place.  The  main,  if  not  the  sole, 
purification  for  retaining  this  measure  is  that  it  has  estabHshed  its 
position  by  long  usage,  and  in  any  case  it  is  very  easily  deduced 
from  the  S.D.  by  the  relation 

p.e.=0-6745  S.D., 

which  follows  at  once  from  the  normal  curve  and  is  only  strictly 
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justified  when  the  distribution  is  normal  (see  p.  246).  Let  it  suffice 
here  that  instead  of  simply  using  the  S.D.,  as  might  now  seem 
the  obvious  course,  some  writers  prefer  to  multiply  the  S.D.  by  a 
certain  fraction,  in  which  there  is  no  particular  virtue  except  that 
which  arises  through  honourable  descent,  and  to  work  with  the 
*  probable  error.' 

Since  we  do  not  know  how  much  weight  to  assign  to  any  result 
unless  the  magnitude  of  its  p.e.  is  ailso  given,  results  are  frequently 
stated  in  the  following  manner  :  in  a  study  of  the  Variation  and 
Correlation  in  the  Earthworm,  by  R.  Pearl  and  W.  N.  Fuller  [Bio- 
metrika,  vol.  iv.  pp.  213-229]  :-- 

Mean  length  of  worm=  19-171  ±0-094  cms., 

S.D.=3-077±0-067  cms., 

C.  of  V.=16-049di0-356  per  cent., 

meaning  that  the  mean  length  of  the  worms  measured  was  19-171 
cms.,  subject  to  a  probable  error  of  0-094  cms.  which  might  be  in 
excess  or  defect,  in  other  words  the  mean  length  lay  probably  some- 
where between 

19-077  cms.  and  19-265  cms.  ; 

similar  remarks  apply  to  the  variabiUty,  absolute  (S.D.)  or  relative 
(C.  of  v.). 

When  the  standard  deviation  (p.e./0-6745)  is  used  as  the  measure 
of  error  due  to  simple  sampling,  the  fact  is  generally  recorded,  and 
it  is  sometimes  spoken  of  as  the  standard  error  in  that  connection, 
but,  as  it  seems  unnecessary  to  multiply  names  for  ideas  which  are 
not  really  new,  only  that  they  appear  in  a  new  setting,  we  shall 
not  employ  the  term. 

It  must  be  clearly  understood  that  no  outstanding  and  predict- 
able cause  exists,  by  our  hypothesis,  for  such  differences  as  occur 
in  the  statistical  constants  between  one  sample  and  another  :  they 
are  the  resultant  effect  of  a  complex  of  forces  which  cannot  be 
properly  traced,  still  less  measured,  apart  from  one  another,  and 
which  have  been  happily  described  as  that  '  mass  of  floating  causes 
generally  known  as  chance.'  Since  therefore  the  forces  coming 
into  play,  under  the  ideal  conditions  formulated,  are  of  the  same 
chance  nature  as  those  affecting  the  spin  of  a  well-balanced  coin 
or  the  selection  of  a  card  from  a  smooth  and  weU- shuffled  pack, 
it  may  be  expected  that  the  resulting  distribution  of  means, 
m^,  mg,  mg,  .  .  .  ,  of  S.D.'s,  cti,  o-g,  o-g,  .  .  .  ,  and  of  all  the  other 
constants  will  likewise  be  subject  to  the  same  laws  of  probabiUty 
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which  serve  to  describe  within  limits  what  happens  in  the  case  of 
coin  or  card.  It  follows  that  some  acquaintance  with  the  first 
elements  of  mathematical  probability  is  essential  if  one  is  to  under- 
stand the  theory  of  sampling,  and  a  short  digression  must  here 
be  made  in  order  to  introduce  that  subject.  This  will  be  found 
to  lead  directly  to  a  solution,  under  certain  prescribed  conditions, 
in  the  simple  case  when  the  character  observed  is  an  attribute  like 
complexion,  fair  or  dark,  or  like  birth,  male  or  female,  which  can 
only  fall  into  one  of  two  definite  classes  and  when  every  one  observa- 
tion in  the  sample  is  independent  of  every  other.  In  the  more 
general  case  where  the  character  observed  is  capable  of  direct 
measurement  and  may  lie  in  magnitude  anywhere  along  a  scale 
of  values  divided  up  into  a  number  of  different  classes,  it  is  not 
so  easy  to  determine  the  effect  of  random  sampling,  because  it  is 
not  possible,  as  it  is  in  the  previous  case,  actually  to  draw  up  a 
frequency  table  describing  in  detail  the  character  of  the  distribu- 
tion to  be  expected  from  theory  in  any  given  sample. 

The  idea  contained  in  the  word  probability  is  one  familiar  to  us 
in  our  everyday  talk,  but  if  we  seek  to  analyse  it  as  used  we  find 
it  as  elusive  as  the  personality  of  the  user.  A  remarks  :  *  Wars 
will  probably  be  stamped  out,  like  duelling,  in  the  course  of  time.' 
B  repHes  :  '  No  !  fighting  will  probably  go  on  as  long  as  the  world 
lasts — you  can't  change  human  nature.'  Now  the  amount  of 
credence  we  are  prepared  to  give  to  each  of  these  statements  is 
vague  and  uncertain  until  we  know  something  about  A  and  B 
themselves  and  the  value  of  their  judgment,  quite  apart  from  the 
influence  of  our  own  opinion  upon  the  matter  ;  perhaps  A  is  an 
optimist  or  B  is  a  pessimist,  and  in  estimating  the  '  probably ' 
used  by  each  we  must  allow  for  these  facts.  ProbabiHty,  then,  in 
ordinary  conversation,  is  something  largely  subjective  :  it  has  a 
varying  significance  according  to  the  person  who  uses  the  word 
and,  unless  we  could  get  rid  of  this  personal  element,  it  would  be 
hopeless  to  try  and  approach  it  along  scientific  lines. 

Mathematical  probability  is  unlike  colloquial  probability  in  that 
all  the  uncertainty  is  taken  out  of  it,  or  at  least  the  uncertainty  is 
confined  within  defined  limits.  We  shall  only  touch  the  fringe  of 
the  subject  in  this  book,  and  what  we  have  to  say  may  be  best 
introduced  by  considering  some  examples  which  may  appear  trivial, 
but  they  possess  the  merit  that  no  personal  bias  can  enter  into 
their  discussion  to  distort  the  results.  The  reader  must  not  be 
impatient  at  their  artificial  character  :  in  many,  if  not  in  all, 
branches  of  science,  before  tackling  any  particular  problem  as  it 
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actually  exists,  it  is  helpful  to  examine  what  can  be  deduced  in  a 
simple  case  free  from  all  complication,  and,  having  settled  that, 
we  try  to  see  how  the  results  are  affected  when  we  come  to  allow 
one  by  one  for  the  various  compHcating  factors  which  exist.  For 
example,  in  Astronomy,  the  track  of  a  planet  in  space  may  first  be 
found  on  the  hypothesis  that  the  sun  alone  is  the  compelling  influence. 
Then  we  may  proceed  to  discuss  how  it  is  deflected  from  its  path 
when  the  gravitational  influence  of  neighbouring  planets  also  is 
taken  into  account. 

Let  us  start  with  an  ordinary  pack  of  playing  cards,  and,  after 
shufifling,  turn  up  one  card.  Can  we  measure  the  probability  that 
this  card  shall  be  (1)  the  7  of  spades  ?    (2)  some  spade  ? 

Altogether  there  are  52  cards,  and  we  will  suppose  that  the 
cards  are  so  cut  and  so  smooth  that  each  of  the  52  has  an  equal 
chance  of  being  turned  up  :  for  instance,  there  is  to  be  no  sticki- 
ness or  anything  to  help  any  particular  card  to  evade  us  by  sticking 
fast  to  its  neighbour.  Now  we  are  certain  to  turn  up  some  card 
and  there  are  52  different  possibilities,  each  of  them  by  hypothesis 
equally  probable.  If,  then,  we  agree  to  denote  certainty  by  unity, 
we  must  divide  1  into  52  equal  parts  and  assign  one  part  to  each 
card  as  the  probabiHty  of  its  appearance. 

1.  The  probability  (or  chance  as  it  is  sometimes  caUed)  of  turning 
up  any  stated  card,  such  as  the  7  of  spades,  is  therefore  1  out  of  52, 
i.e.  1/52. 

2.  Again,  since  there  are  13  spades  in  all,  the  chance  of  turning 
up  some  spade  is  13  out  of  52,  i.e.  13/52=1/4. 

These  results  may  be  put  in  another  way  which  is  often  useful. 
If  the  experiment  is  repeated  a  great  number  of  times,  a  return  to 
the  initial  conditions  of  the  problem  being  made  after  each  trial 
by  replacing  the  card  drawn  and  reshuffling  the  pack,  we  should 
expect  to  turn  up  the  7  of  spades  on  the  average  about  once  in 
every  52  experiments,  and  we  should  expect  to  turn  up  some  spade 
on  the  average  about  once  in  every  4  experiments.  This  must 
not  be  taken  to  mean  that  in  4  experiments  we  are  sure  to  turn 
up  just  one  spade — a  trial  wiU  readily  prove  such  a  statement  to 
be  untrue — but  that,  if  we  went  on  performing  experiment  after 
experiment,  we  should  in  the  long  run  get  a  proportion  of  about 
1  spade  to  every  4  experiments  and  a  trial  will  likewise  prove  the 
truth  of  this  statement. 

GeneraUy,  when  an  event  can  happen  in  n  different  ways  alto- 
gether, and  among  these  different  ways  there  are  a  which  .give 
what  might  be  caUed  successful  events,  the  probability  of  success 
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at  any  single  happening  is  a  out  oin^i.e.  ajn,  and  is  usually  denoted 
by  the  letter  p,  and  the  probability  of  failure  is  (n—a)  out  of  n, 
i.e.  {n—a)ln,  and  is  usually  denoted  by  the  letter  q. 

Clearly  {p-\-q)=l,  and  this  is  reasonable  because  we  are  certain 
to  get  either  a  success  or  a  failure  at  a  single  trial  and  unity  was 
fixed  as  the  measure  of  certainty.  In  k  trials,  the  probable  number 
of  successes  would  be  kp  and  of  failures  kq,  because  in  n  trials,  on 
the  average,  there  are  a,  or  np,  successes  and  (n—a),  or  nq,  failures. 

Example  (1). — In  the  second  case  considered  above,  the  pro- 
bability of  success  (turning  up  a  spade)  is  a  out  of  n 

=a/7i=  13/52- 1/4=^, 

and  the  probabiUty  of  failure  (not  turning  up  a  spade,  i.e.  turning 
up  one  of  39  other  cards)  is  (n—a)  out  of  n 

=  (n-a)ln=39l52=3l4:=q. 
And  (^+g)=l/4+3/4=l. 

Example  (2). — What  is  the  chance  of  drawing  either  a  picture 
card  or  an  ace  from  the  pack  at  a  single  trial  ? 

Altogether  there  are  12  picture  cards,  and  the  chance  of  drawing 
any  one  of  them  is  thus  12  out  of  52 

=  12/52=3/13; 

and  the  chance  of  drawing  any  one  of  the  4  aces  is  4  out  of  52 

=4/52=1/13. 

Hence  the  total  probability  required 

=3/13+1/13=4/13. 

Generally,  if  the  probability  of  one  type  of  event  is  p^,  and  the 
probability  of  a  second  t3rpe  of  event  is  ^2»  ^^^  if  either  type  is 
reckoned  a  success,  then  the  total  probabiUty  of  success  is  (Pi+Pz)- 
This  evidently  holds  good  however  many  different  types  there 
may  be,  and  even  if  there  is  only  one  event  of  each  type. 

Consider  now  the  simultaneous  happening  of  two  events,  one  of 
which  can  happen  in  n  different  ways,  a  among  which  are  to  be 
regarded  as  successful,  and  the  second  can  happen  in  n'  different 
ways,  a'  among  which  are  to  be  regarded  as  successful.  Further, 
the  two  events  are  to  be  absolutely  independent  of  one  another 
in  the  sense  that  neither  is  to  influence  the  success  or  failure  of 
the  other.     What  is  the  probability  of  a  double  success  occurring  ? 

The  total  number  of  different  combinations  of  the  two  events 
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possible  is  nn' ,  for  any  one  of  the  n  possible  happenings  for  the 
first  event  can  be  combined  with  any  one  of  the  n'  possible  happen- 
ings for  the  second  event.  Also  the  total  number  of  different 
combinations  of  two  successes  possible  is  aa\  for  any  one  of  the 
a  possible  successes  for  the  first  event  can  be  combined  with  any 
one  of  the  a'  possible  successes  for  the  second  event.  Hence, 
according  to  our  definition  of  probability,  the  probabiUty  of  a  double 
success  is  aa'  out  of  rin' =aa' jnn'  ={aln){a' jn'). 

Thus  to  get  the  probabiUty  of  a  double  success  for  a  combination 
of  two  independent  events  we  must  multiply  together  the  separate 
probabilities  for  the  success  of  each  event  taken  by  itself. 

Similarly,  in  the  above  catee,  the  probability  of  a  double  failure 
=  (n—a)(n' —a')lnn' ;  and  the  probability  of  one  success  and  one 
failure 

_a    n'—a'n—a    a'  • 

—  .        -     -\-  .  — - 

n        n  n       n 

for  the  first  event  can  be  a  success  and  the  second  a  failure  or  the 
first  a  failure  and  the  second  a  success. 

Here,  again,  if  we  take  all  the  different  possibilities  into  account, 
and  add  the  probabilities  corresponding  to  each  case,  we  arrive 
at  certainty,  the  measure  of  which  is  unity,  thus  : — 

probability  of  2  successes  =aa'lnn\ 

„  1  success  and  1  ia,iluTe=a{n'—a')lnn'-}-a'{n—a)lnn' 

„  2  failures  ={n—a)(n'—a')lnn\ 

Therefore  total  probability,  all  cases, 

_aa'     a(n'—a')  ,  a'(n—a)     {n—a)(n'—a') 

,-T — T- -, + ; — 

nn  nn  nn  nn 

=  {aa'  -\-an' —aa'  -\-a'n—a'a-\-nn' —na' —an'  -{-aa')lnn' 
=nn'lnn' 
=  1. 

Example. — Take  two  packs  of  cards.  What  is  the  probability 
of  drawing  an  ace  from  the  first  pack  and  a  king,  queen,  or  knave 
from  the  second  pack  ? 

Here  a=4,  n=62,  a' =12,  n'=52  ;  hence  the  required  probability 

=aa7^7i'=4/52x  12/52-3/169=  l/56i. 

Thus  we  might  expect  to  succeed  on  the  average  about  once  in 
56  trials. 
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We  proceed  to  discuss  the  case  of  a  coin  spun  a  number  of  times 
in  succession,  and  we  shall  find  the  probabilities  of  the  appearance 
of  so  many  heads  (H)  and  so  many  tails  (T)  in  so  many  spins  on  the 
hypothesis  that  the  coin  is  perfectly  balanced  and  equally  likely 
to  fall  on  either  side. 

In  1  spin  there  are  2  possible  events,  namely  H  or  T,  which 
we  shall  write  simply  as 

(H,  T). 

In  2  spins  there  are  4  possible  events,  because  we  can  combine 
the  H  or  T  of  the  first  with  an  H  or  T  at  the  second  spin,  and  we 
may  express  the  result  thus 

(H,  T)(H,  T)=(HH,  HT,  TH,  TT) ; 

the  interpretation  of  which  is  that  we  may  get  either  head  followed 
by  head,  or  head  followed  by  tail,  or  tail  followed  by  head,  or  tail 
followed  by  tail. 

In  3  spins  there  are  8  possible  events,  because  we  can  combine 
the  4  events  previously  possible  with  an  H  or  T  at  the  third  spin, 
thus  getting 

(H,  T)(H,  T)(H,  T) 

=  (H,  T)(HH,  HT,  TH,  TT) 

=  (HHH,  HHT,  HTH,  HTT,  THH,  THT,  TTH,  TTT) ; 

the  interpretation  of  which  is  that  we  may  get  either  3  heads  in 
succession,  or  2  heads  followed  by  1  tail,  or  head  followed  by  tail 
followed  by  head,  and  so  on. 

In  4  spins  there  are  16  possible  events,  because  we  can  combine 
the  8  events  previously  possible  with  an  H  or  T  at  the  fourth  spia, 
thus 

(H,  T)(HHH,  HHT,  HTH,  HTT,  THH,  THT,  TTH,  TTT) 
=  (HHHH,    HHHT,    HHTH,    HHTT,    HTHH,    HTHT, 
HTTH,  HTTT,    THHH,    THHT,    THTH,    THTT, 
TTHH,  TTHT,  TTTH,  TTTT). 

But  the  method  here  adopted  to  get  the  possible  events  at  each 
stage  is  precisely  the  same  as  that  which  gives  the  successive  terms 
in  the  ordinary  algebraical  expansions  of 

(H+T),  (H+T)(H+T),  (H+T)(H+T)(HH-T),  etc. 
Also  each  new  spin  has  the  effect  of  doubling  the  number  of  possible 
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events  obtained  at  the  previous  spin,  and  we  conclude  that  in 
n  spins,  there  are 

(2  X  2  X  2  X    .  .  .  to  ^  factors),    • 
or  2"*,  possible  events,  and  these  events  are  given  by  the  successive 
terms  in  the  expansion  of 

[(H+T)(H+T)(H+T)  ...  to  7^  factors.] 

Let  us  now  consider  the  probabilities  of  the  different  events 
obtainable.  The  important  point  to  notice  is  that  at  any  stage 
each  possible  event  has  exactly  the  same  probability,  for  there  is 
no  reason  why  any  particular  spin  should  give  H  rather  than  T, 
or  T  rather  than  H  :  for  example,  in  3  spins  there  are  8  possible 
events,  each  by  itself  equally  probable,  and  we  therefore  divide 
the  unity  of  certainty  into  8  equal  parts  and  assign  one  part  to  each 
event,  thus 

probability  of  3  heads— HHH=J 
probability  of  2  heads  and  1  tail— HHT=Jl 

HTH=i   I 
THH^iJ 
probability  of  1  head  and  2  tails— HTT=Jj 

THT=iU 
TTH^iJ 
probability  of  3  taUs-TTT=J. 

It  is  clear  from  this  arrangement  that,  if  the  order  of  the  appear- 
ance of  H  and  T  is  indifferent,  some  events  are  of  the  same  type 
and  some  types  are  likely  to  appear  oftener  than  others,  e.g.  the 
probability  of  getting  '  2  heads  and  1  tail '  (or  '  1  head  and  2  tails  ') 
is  three  times  as  great  as  the  probability  of  getting  '  3  heads  ' 
or  '  3  tails.'  Hence  for  conciseness  it  is  convenient  to  adopt  the 
ordinary  index  notation  and  write 

HHH=H3,  HHT=H2T,  HTH-H^T,  etc., 
so  that  the  possible  events  in  3  spins  are 

H3,  3H2T,  3HT2,  T^  ; 
in  4  spins  they  are 

H*,  4H3T,  6H2T2,  4HT3,  T^  ; 
and  so  on. 

The  probability  of  any  particular  type  is  now  readily  written 
down  :   e.g.  in  4:  spins,  the  probability  of  getting  2  heads  and  2  tails 

=  (number  of  successful  events  possible)/(total  number  of  events 

possible) 
=6/2*=6/16=i. 
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But  the  binomial  expansion  always  sums  together  terms  of  the 
same  type  for  us  in  just  the  manner  wanted,  and  we  have  the 
possible  events  in  n  spins  given  by  the  successive  terms  in  the 
expansion  of 

(H+T)(H+T)(H+T)  ...  to  n  factors, 
i.e.  (H+T)«, 

i.e.  H«+"Ci .  H'»-iTi+«C2H"-2T2+  .  .  .  +T", 

and  therefore  again  the  probability  of  any  particular  combination 
is  readily  written  down  :    e.g.  probability  of  '  (n—2)  heads,  2  tails  ' 
=  (number  of  successful  events  possible)/(total  number  of  events 
possible) 

Another  way  of  stating  the  result  obtained  is  to  say  that  we 
might  expect  to  get 

n  heads  appearing  on  the  average  about  once  in  every  2**  trials, 
(n—1)  heads,  1  tail         ,,  ,,  ,,        "0^  times     ,,  „ 

(?i— 2)  heads,  2  tails        ,,  „  ,,        ^^Cg  times     ,,  „ 

and  so  on. 

If,  in  accord  with  our  previous  notation,  we  call  the  appearance 
of,  say,  H  at  any  spin  a  '  success,'  and  label  its  probability  J  by  the 
letter  ^,  and  if  consequently  the  appearance  of  T  at  any  spin  is  a 
*  failure,'  its  probability,  J,  to  be  labelled  by  the  letter  q,  we  have  the 
probabilities  of  the  different  combinations  of  events  in  (H+T)",  or 

H«+«CiH«-iTi+«C2H«-2T2+  .  .  .  +T«, 
given  by  the  corresponding  terms  in  {jp-\-qY,  or 

where  p=gr=:|. 

After  each  spin  of  the  coin  in  the  case  considered  the  distribution 
of  probabilities  was  symmetrical,  e.g.  after  the  fourth  spin  the  pro- 
babilities were 

14  6  4  1 

T^J    TFJ    T^»    T^'    T^- 

We  pass  on  now  to  a  case  where  the  distribution  is  not  symmetrical, 
owing  to  the  fact  that  p  and  q  are  no  longer  equal  for  any  isolated 
event. 

Consider  the  throw  of  an  ordinary  die  in  which  each  of  the  six 
faces  is  assumed  to  have  an  equal  chance  of  appearing  uppermost. 
The  probability  of  throwing,  say,  a  3  is  1/6,  since  we  are  certain 
to  throw  either  1,  2,  3,  4,  5,  or  6  ;  and  the  probability  of  failing  to 
throw  a  3  is  5/6,  since  we  are  certain  either  to  throw  a  3  or  not 
to  throw  a  3. 
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If  we  represent  the  probability  of  success  (say,  in  this  case, 
throwing  a  3)  by  ^  {i.e.  1/6),  and  failure  {i.e.  in  this  case,  failing 
to  throw  a  3)  by  g  {i.e.  5/6),  we  have 

iJ+g=  1/6+5/6=  I. 
Bearing  in  mind  then  that  the  probability  for  a  combination  of  two 
independent  events  is  determined  by  multiplying  together  the 
separate  probabHities  for  each,  we  have  the  following  table  showing 
what  might  be  expected  when  1,  2,  or  3  dice  are  thrown  up  together, 
where  5  stands  for  success  and  /  for  failure  : — 


No.  of 

Dice 

thrown. 

Different 
Possibilities. 

Different 
Probabilities. 

1 
2 

3 

ss,  sf. 

p,ff- 

sss,  ssf,  sfs,  sff, 
fss,fsf,ffs,fff. 

QP,  qq- 
pppy  ppq^  pqp,  pqq, 

qpp,  qpg,  qqp,  qqq. 

The  table  is  easily  extended  on  the  same  principle,  and  at  each 
step,  it  will  be  noticed,  a  fresh  pair  of  possibiUties,  s  or/,  is  intro- 
duced, with  corresponding  p  or  q,  to  be  combined  with  what  has 
gone  before. 

If  the  order  of  appearance  of  s  and  /  is  a  matter  of  indifference, 
e.g.  if  it  does  not  matter  whether  the  first  die  shows  s  and  the 
second  /,  or  vice  versa,  so  that  results  of  the  type  sff  and  fsf  may 
be  regarded  as  equivalent,  we  may  use  the  index  notation,  as  in 
the  coin  case,  to  render  the  table  more  concise,  thus  : — 


No.  of 

Dice 

thrown. 

Different 
Possibilities. 

Corresponding 
Probabilities. 

1 

2 
3 

s,f. 

s\2sf,p. 

^,3s%3sf^,f^ 

p>q- 

P\  2pq,  q'- 
p\  3p%  3pq\  q\ 

When,  therefore,  n  dice  are  thrown  we  again  recognize  the 
different  possibiUties  as  given  by  the  successive  terms  in  the  ex- 
pansion of  (-5+/)",  namely 

5«4_nC^5«-l/l+'^C2.S«-2/2+    .    ,    .    +/«^ 

and  the  corresponding  probabiUties  by  the  successive  terms  in  the 
expansion  of  (33+g')",  namely 
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Hence  the  probability  of  throwing  n  threes =j9"=  1/6"  ; 


(n-l) 


(n-2) 


_         1      5 
*  6"-i  *  (5 
=5n/6"  ; 

n(n—\) 


62 


1-2 

■25n(n- 


l)/2  .  6« ; 


and  so  on. 

The  result  we  have  just  obtained  is  of  perfectly  general  appUca- 
tion.  Whether  we  spin  n  coins,  in  which  the  probabihty,  p,  of 
success  (say  '  heads  ')  for  each  is  1/2,  or  throw  n  dice,  in  which  the 
probability,  p,  of  success  (say  '  to  get  a  3  ')  for  each  is  1/6,  or  have 
any  n  similar  but  independent  events  happening  in  which  the 
probability  of  success  for  each  is  p,  the  different  resulting  possi- 
bihties  as  to  success  are  given  by  the  successive  terms  in  the  expan- 
sion of  («+/)",  and  their  corresponding  probabilities  are  given  by 
the  successive  terms  in  the  expansion  of  (p-\-q)^. 

We  are  thus  in  a  position  to  form  a  frequency  table,  like  that  on 
p.  53,  showiQg  the  probabilities  of  getting  0,  1,  2  ...  ti  successes 
(in  other  words,  the  proportional  frequencies  of  these  different 
numbers  of  successes)  at  the  occurrence  of  n  similar  independent 
events,  where  p  is  the  probability  of  success  for  each  and  q  is  the 
probability  of  failure  : — 

Table  (35).  Binomial  Distribution. 

(1)  (2)  (3)  (4) 


Number  of 
Successes. 


Frequency. 


{X) 

0 
1 


(/) 


n{n-l)  n-2^ 
1-2    ^      ^ 

^(^-l)(^-2)^-3p3 


1-2-3 


Product  of  Nos.  in 
Cols.  (1)  &  (2). 


(^) 
0 

7i(?i-l)3"-V 
i{n-l){n-2)^_,^. 


1-2 


?ip" 


np 


Product  of  Nos.  in 
Cols.  (1)  &  (3). 


0 

nq^~^p^ 


»-2*»2 


271(71- 1)2«-V 

Mn-l){n-2)      ,  . 
U2  ^      ^ 


TlV 


np[l+p(n-l)] 
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Col.  (1)  gives  the  deviations  from  the  origin  of  measurement,  I 
which  in  this  case  is  taken  as  '  no  successes,'  the  class  interval  j 
being  equal  to  a  difference  of  1  in  the  number  of  successes.  \ 

The  summations  of  the  last  three  columns  are  effected  as  ^ 
follows  : — 

Col.    (2).      grn_|_^^n-lpl_|_^H^^:ZO^n-2p2_|_    ,    ^    ,    _|_^n  \ 

1*2  ' 

'( 

because  jp+g=l.  \ 

i 

CoZ.  (3).  \ 


fiifi lU^i 2) 

1  *  Zi 


■up 


gn-l^(^_l)^n-2^1_|_(^       1)(^      ^^g^-3j92-f    .    .    .    +^«-l1 


=  ^29. 

Coi.  (4).  '                               ;, 

wg"-V+2TO(ro- 1)^"-^+^"^""^ ^>(-"ZJ-V-3j)3+  .  .  .  +to2^«    ,j 

=^np[l  +  (n-l)p(q+p)n-^-\ 

=  7l^[lH-^(7l— 1)].  I 

The  arithmetic  mean  of  the  distribution  | 

=sum  of  terms  in  col.  (3)/sum  of  terms  in  col.  (2)                        \ 

=Z(fx)IS(f)  \ 

=np.  : 
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The  mean-square  deviation  referred  to  zero  as  origin,  zero  in  this 
case  corresponding  to  '  no  successes  ' 

=sum  of  terms  in  col.  (4)/sum  of  terms  in  col.  (2) 

=Z{fx^)IZ{f) 

=njp[\+'p(n—\)']. 

Thus  the  standard  deviation,  a,  is  given  by 

(j^=njp\\-\-p{n—\)]—x^, 

where  x  is  the  deviation  of  the  mean  from  the  origin  of  measure- 
ment, so  that  x=np. 

Therefore  G^=np[l-\-p{n—l)]—n^p^ 

='np{l—p)-{-  n^p^—  n^p^ 

=npq. 
Hence  cr=  \/(npq), 

and  p.e.= 0*6745  V'(npq). 

These  two  results  are  exceedingly  important,  and  it  is  essential 
to  understand  what  it  is  they  measure.  An  example  may  help 
to  make  this  clear. 

If  we  spin  300  coins,  counting  *  head  '  for  each  a  success,  the 
number  of  heads  we  shall  get  will  be  unlikely  to  differ  very  greatly 
from  the  average  or  mean  number  of  successes,  np,  i.e.  150  if  p=ll2 
for  each  coin,  and  in  the  long  run,  if  we  repeat  the  experiment  a 
great  number  of  times,  we  shall  get  a  proportion  of  about  150  heads 
to  every  one  experiment.  Again,  if  we  throw  300  dice,  counting 
every  throw  of  the  number  5,  say,  for  each  die  a  success,  so  that 
p  in  this  case  =1/6,  the  number  of  fives  we  shall  get  will  be  unlikely 
to  differ  much  from  np,  i.e.  50,  and  in  the  long  run,  if  we  repeat  the 
experiment  a  great  number  of  times,  we  shall  get  on  the  average 
a  proportion  of  about  50  fives  to  every  experiment ;  we  should 
find,  for  example,  something  like  5000  fives  if  we  threw  300  dice 
IPO  times  in  succession.  The  arithmetic  mean  of  the  distribution 
tells  us  therefore  about  what  number  of  successes  to  expect  in  one 
experiment  with  n  events  if  n  is  fairly  large,  though  we  should  be 
unlikely  to  get  exactly  this  number  if  we  confined  ourselves  to  the 
one  experiment. 

The  second  result,  the  S.D.,  supplies  us  with  a  measure  of  the 
unlikelihood  of  getting  the  exact  number  of  successes  expected  at 
any  single  experiment,  for  it  defines  the  dispersion  of  the  different 
numbers  of  possible  successes  about  their  average.  Clearly  the 
greater  the  dispersion,  the  greater  is  the  likeUhood  of  missing  the 

K 
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average.  The  mean  number  of  successes  when  an  experiment  is 
repeated  a  great  number  of  times  is  n^,  but  at  any  single  experi- 
ment it  is  not  unlikely  that  the  number  of  successes  obtained  may 
differ  from  np  by  as  much  as  0-6745  '\/(njpq)  in  excess  or  in  defect ; 
it  is,  however,  unlikely,  as  we  shall  see  later  (p.  244),  that  the 
number  will  differ  from  np  by  more  than  ^^/{npq)  in  excess  or 
defect  when  the  distribution  is  not  very  skew,  or  unsymmetrical, 
especially  if  n  be  large.  The  probable  error  in  the  case  above  when 
we  throw  a  sample  of  300  dice  is 

=0-6745  V(300  X 1/6  x  5/6)=0-6745  V(41-67)=:4-4, 

and  it  is  therefore  quite  likely  that  the  number  of  fives  obtained 
at  one  experiment  will  differ  from  the  expected  number,  50,  by  as 
much  as  4  or  5  in  excess  or  defect,  but  it  is  unlikely  that  the  number 
will  fall  outside  the  limits  50±3V(41-67),  say  30  to  70. 

It  is  sometimes  more  convenient  to  refer  to  the  proportion  of 
successes,  etc.,  expected  at  any  experiment  rather  than  to  the 
actual  number  expected.  In  that  case,  since  with  n  events  the 
expected  number  of  successes  is  pn,  but  the  number  obtained  may 
quite  likely  differ  from  this  by  ■±:Q^&14:^^/{npq),  therefore  with 
n  events  the  expected  proportion  of  successes  is  pnjn,  i.e.  p,  with 
quite  possibly  an  error  =i 0-6745 y'(n25g)/7i,  i.e.  i 0-6745 v^(2)g'/^). 

Thus,  with  the  300  dice,  the  expected  proportion  of  successes  at 
one  experiment  lies  between 

[1/6-0-6745 V(l/6x5/6-^- 300)]  and  [1/6+0-6745^(1/6x5/6-^300)] 

i.e.  (1/6-0-6745/46-5)  and  (1/6+0-6745/46-5) 

i.e.  1/5-5  and  1/6-6  ; 

and  it  is  unlikely  that  the  proportion  will  differ  from  1/6  by  more 
than  3/46-5,  i.e.  1/15-5. 

To  illustrate  how  the  binomial  distribution  might  be  directly 
applied,  an  experiment  was  made  with  900  digits  selected  at  random 
by  taking  in  succession  the  digits  in  the  seventh  decimal  place  in 
the  logarithms  of  the  following  numbers  : — 

10054,  10154,  10254,  .  .  .  99954, 

as  given  in  Chambers's  Mathematical  Tables.  In  this  way  each  of 
the  10  digits,  0,  1,  2,  3  ...  9,  may  be  supposed  to  have  stood  an 
equal  chance  of  selection  each  time  one  was  written  down.  Gaps 
of  100  were  left  between  the  numbers  selected  so  as  to  avoid  runs 


INTRODUCTION   TO    PROBABILITY   AND    SAMPLING     147 

of  the  same  figure  which  sometimes  occm-  even  in  the  seventh 
decimal  place  owing  to  lack  of  independence. 

The  digits  were  arranged  in  36  columns,  each  column  containing 
25  digits,  and  in  this  way  we  obtained  what  was  equivalent  to 
36  separate  but  like  experiments  with  25  events  each.  If  we  agree 
to  regard  the  appearance  of  a  7  or  an  8  as  a  successful  event,  and 
the  appearance  of  any  other  digit  as  a  failure,  the  chance  of  success 
at  any  appearance  is  2/10,  and  the  chance  of  failure  is  8/10.  The 
case  is  thus  of  exactly  the  same  kind  as  that  of  throwing  25  dice 
36  times  in  succession,  and  if  the  probability  of  success,  namely  1/5, 
for  each  independent  event,  be  denoted  by  ^,  and  the  probability 
of  failure,  namely  4/5,  by  q,  the  distribution  of  successes  and  failures 
should  approximately  conform  to  that  given  by  the  expansion  of 

for  any  particular  experiment,  and  since  the  experiment  was  re- 
peated 36  times,  the  total  numbers  of  successes  and  failures  of 
different  orders  obtained  should  approximately  conform  to 

m(p+qY\ 

for  if  the  probability  of  an  event  is  jp  the  number  of  events  to  be 
expected  in  N  trials  is  Np. 

The  actual  distribution  observed  is  compared  with  that  given 
by  the  binomial  expansion  in  Table  (36).  Col.  (2)  is  obtained  by 
picking  out  the  appropriate  terms  in  the  expansion  of  36Q3+g)25, 
where  p=l/5,  g=4/5  ;   this  expansion  is 

/OK  OK  .  OA  \ 

36U'^+^.i)2Y+-^P^¥+  .  .  •  +g'H 

Thus,  5  successes  occur 

•^  25  •  24  ...  6    5  20 
1  •  2  •  3  .  .  .  20^  "^ 

times,  and  this  equals  7-06,  or  approximately  7. 

The  mean  number  of  successes  by  theory=rip=25/5=5.  The 
mean  by  trial,  since  it  is  measured  from  zero  as  origin,  the  numbers 
in  col.  (1)  being  the  deviations, 

=i;(/x)/2'(/)=  162/36=4-5. 
The  standard  deviation  by  theory 

=  V(^M)=v'(25xix|)=2. 


m^ 
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Table  (36).  Distribution  of  Successes  (getting  a  7  or  8)  m 
THE  Random  Choice  of  25  digits  36  times  in  succession. 

(1)  (2)  (3)  (4)  (6) 


No.  of 

Successes. 

Frequency 
Calculation. 

Frequency 
Experiment. 

Product  of 

Nos.  in 
Cols.(l)&(3). 

Product  of 

Nos.  in 

Cols.(l)&(4). 

{X) 

1 

1 

(/) 

1 

1 

1 

2 

3 

5 

10 

20 

3 

5 

5 

15 

45 

4 

7 

7 

28 

112 

5 

7 

9 

45 

225 

6 

6 

4 

24 

144 

7 

4 

3 

21 

147 

8 

2 

0 

0 

0 

9 

1 

2 

18 

162 

36 

36 

162 

856 

By  trial,  the  mean  square  deviation,  measured  from  zero  as  origin 

=856/36. 
Thus  the  S.D.  by  trial=  VftV—^'), 
where  x  is  the  deviation  of  the  mean  from  the  origin, 

=  V[856/36- (4-5)2 
=  1-88. 

It  wiU  be  seen  that  not  one  of  the  36  experiments  gave  a  number 
of  successes  differing  from  5,  the  theoretical  mean,  by  more  than 
twice  the  S.D.,  for  the  number  ranges  only  between  1  and  9. 

If  we  treat  the  900  digits  as  900  separate  experiments  with  one 
event  each,  instead  of  treating  them  as  36  experiments  containing 
25  events  each,  we  have  1/10  as  the  chance  for  the  appearance  of 
any  particular  digit,  and  hence  the  number  of  times  any  digit  may 
be  expected  to  appear 

=^i'ibfV(^PQ')>  approximately 
=  (900)TVd=IV(900XTVxT'^) 
=90±6. 
The  actual  number  of  occurrences  of  each  digit  was  as  follows  : — 


Digit  .... 
No.  of  Occurrences 

0 
95 

1 
96 

2 
93 

3 
105 

4 
91 

5 
80 

6 

82 

7 
72 

8 
90 

9 
96 
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so  that  the  digit  7  showed  the  greatest  divergence  from  90  of  any, 
and  this  was  only  just  three  times  the  probable  error. 

[The  Theory  of  Probability  is  older  than  that  of  Statistics.  Todhunter,  in 
his  History,  states  that  '  writers  on  the  subject  have  shown  a  justifiable  pride 
in  connecting  its  true  origin  with  the  great  name  of  Pascal.'  The  well-known 
story  of  the  latter  being  found,  as  a  lad  of  twelve,  tracing  out  on  the  hall  floor 
geometrical  propositions  which  he  had  evolved  in  his  own  head  is  not  to  be 
wondered  at,  nor  yet  that  at  sixteen  he  wrote  a  small  work  on  Conic  Sections, 
when  one  reflects  upon  the  fame  he  was  to  win  as  a  philosopher  and  writer, 
as  well  as  a  mathematician,  in  his  too  brief  life  of  thirty-nine  years.  He  was 
born  in  1623  of  a  distinguished  French  family,  and  for  the  last  half  of  his 
life  he  suffered  from  the  effects  of  a  serious  disease  which  contributed  to  turn 
his  attention  from  mathematics  to  religion  and  philosophy. 

We  learn  from  Todhunter  how  a  certain  gentleman  of  repute  at  the  gaming 
tables  set  Pascal  pondering  on  a  question  of  probability  concerning  the  fair 
division  of  stakes  between  two  players  who  give  up  their  game  before  its  con- 
clusion— an  old  problem  cited  in  a  work  by  Luca  Pacioli  as  early  as  1494.  A 
correspondence  followed  between  him  and  Fermat,  then  probably  the  two  most 
distinguished  mathematicians  in  Europe,  and  so  began  a  science  which  has 
fascinated  at  one  time  or  another  all  great  mathematicians  from  that  day  to 
this. 

The  illustrious  family  of  the  BemouUis,  friends  of  Leibnitz,  who  championed 
his  claim  against  that  made  by  English  mathematicians  on  behalf  of  Newton 
to  the  invention  of  the  Calculus  ;  De  Moivre,  an  exile  in  England,  owing  to 
the  revocation  of  the  Edict  of  Nantes ;  Euler,  Lagrange,  and  Laplace,  who 
worked  out  in  algebraical  form  Newton's  theory  of  gravitation  for  the  motion 
of  the  planets — all  these  had  a  share  in  building  up  the  science  of  Probabihty, 
often  by  investigating  problems  in  games  of  chance,  where  the  conditions  can 
be  made  mathematically  perfect,  so  by  careful  analysis  preparing  the  way  for 
the  use  later  of  the  same  principles  in  matters  of  greater  importance. 

It  has  been  said  that  the  development  of  the  subject  owes  more  to  Laplace 
(1749-1827)  than  to  any  other  mathematician  ;  nor  did  he  confine  himself  to 
its  theory  :  he  would  have  earned  fame  by  his  astronomical  apphcations  alone. 
His  method  was  to  take  certain  observations,  and  to  determine  by  means  of 
probability  whether  the  abnormalities  present  were  merely  the  results  of  chance 
or  whether  there  was  some  as  yet  undiscovered  but  constantly  acting  cause 
behind  the  phenomena  observed.  In  this  way  he  was  led  to  highly  interesting 
and  important  results  such  as  those  relating  to  the  theory  of  the  tides,  the 
effect  of  the  spheroidal  shape  of  the  earth  on  the  motion  of  the  moon,  the 
irregularities  of  Jupiter  and  Saturn,  and  the  laws  which  govern  the  motion 
of  Jupiter's  moons.  It  needs  but  a  step  in  thought  to  pass  from  the  dis- 
cussion of  such  physical  data  to  the  statistics  of  social  phenomena  and  the 
causes  which  determine  abnormalities  met  with  in  that  field.  Professor  Edge- 
worth,  in  making  reference  to  books  that  have  been  written  on  Probability  at 
the  end  of  his  excellent  article  under  that  heading  in  the  Encycl&p^ia 
Britannica,  remarks  that  '  as  a  comprehensive  and  masterly  treatment  of 
the  subject  as  a  whole,  in  its  philosophical  as  well  as  mathematical  character, 
there  is  nothing  similar  or  second  to  Laplace's  TMorie  analytique  des 
probabilites.'] 


CHAPTER    XIII 
SAMPLING  {continued) — formula  for  probable  errors 


GENERAL   POPULATION. 


So  far  we  have  only  considered  the  most  simple  case  of  random 
sampling  when  we  take  a  sample  of  n  independent  events  each  of 
which  falls  into  one  of  two  classes  according  to  its  natm*e,  the 
chance  of  entering  either  class  being  the  same  for  every  event : 
we  have  dealt,  that  is  to  say,  more  particularly  with  non-measurable 

characters.  We  pass  on  now  to  measur- 
able characters  which  are  distributed 
among  several  classes  according  to  their 
size,  so  that  a  frequency  distribution 
table  can  be  set  up  for  each  sample  ;  and 
assuming  that  the  population  from  which 
the  samples  are  drawn  is  homogeneous, 
the  samples  themselves  containing  each 
an  adequate  number  of  individuals,  there 
should  not  be  greater  differences  between 
one  table  and  another  than  can  be  ac- 
counted for  by  random  sampling.  It  is 
our  object  to  discover  how  great  such 
differences  may  be. 

Given  a  homogeneous  population  of  N 
individuals  which  we  will  suppose  could 
be  distributed  into  a  number  of  groups, 
Yi  individuals  in  the  first  group,  Yg  in  the 
second  group,  Yg  in  the  third,  and  so 
on,  according  to  the  size  of  the  organ  or 
character  under  observation.  Suppose  a 
random  sample  of  n  individuals  be  taken 
from  this  population,  and  when  they  are 
assigned  to  their  several  groups  let  the 
frequency  table  now  take  the  form  shown, 
with  2/1  individuals  in  the  first  group,  y^ 
in  the  second,  and  so  on.  To  find  the  "probable  error  of  ?/&,  the 
frequency  observed  in  the  kth  group. 

160 


Class. 

Frequency. 

1st  Group 
2nd  Group 

Tcth.  Group 

N 

SAMPLE. 


Class. 

Frequencj-. 

1st  Group 
2nd  Group 

Tcth.  Group 

2/2 

n 
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Consider  the  selection  of  the  n  individuals,  one  by  one  in  succession, 
to  form  the  sample.  When  the  first  choice  is  made  the  probability 
that  we  shall  get  an  individual  falling  into  the  Jcth.  group  is,  by  defini- 
tion, Yj./N,  and  the  probabiHty  will  remain  practically  the  same  for 
each  successive  choice  granted  that  N  is  considerable.  We  have  thus 
n  independent  events,  the  chance  of  success  (falling  into  the  kth. 
group)  for  each  being  ^(=Yj./N)  and  the  chance  of  failure  being 


/=!-: 


_  ^       The  case  is  therefore  analogous  to  the   one  pre- 

viously considered  to  which  the  binomial  distribution  is  applic- 
able, so  that  the  frequency  to  be  expected  in  the  kth  group  is  np 

i.e.  y]c=np  with  a  p.e.=0'Q14:5Vnpq. 

Yg,  Yg  .  .  .  would  not  be  known, 
and  hence  the  true  value  of  p  would  also  be  unknown,  but  since 
yjg=np,  approximately,  when  the  sample  is  of  adequate  size,  we 
shall  get  a  fair  idea  of  the  probable  error  involved  by  taking 
p=yjcln,  where  2/a;  is  the  actual  frequency  observed  in  the  Jcth  group. 

y>,  (1) 


with  S.D.,  Gy  =Vnpq 

Now  in  practice  the  numbers  Y^,  j-  2?  ^  3 


Hence,    o^y^=npq=yj,(l—p)=yJl— 


and  the  frequency  in  the  A;th  group 


yj,±0-6745 


M^' 


yu\ 


(2) 


/. 


The  size  of  the  S.D.  is  under  ordinary  conditions  a  test  of  the 
-adequacy  of  the  sample,  for  the  frequency  in  the  kth  group,  if  due 


yvA  simply  to  random  sampling, 
/i  J^  should  not  differ  from  its 
{ V      expected  value  by  more  than 


(Z/±3cT, 


and  a, J    should  therefore      j 


A 


be   small    compared   with   ?/& 
itself. 

To  find  the  correlation  between 
the  frequencies  in  any  two 
groups  of  a  sample  distribution. 

Let  the  expected  frequencies 
in  the  various  groups  of  the 
sample  be  denoted  by  y^,  y^, 
.  .  .^  2/fcj  •  •  •»  ^^d  suppose  an 
error  82/ &  "^  Vk  is  associated 
with  errors  Sy^,  Sy^,  .  .  •  ,  S?/,,  .  •  •  in  y^,  y^, 
require  then  the  correlation  between  yj^  and  y^. 


Class. 

Expected 
Frequency. 

Observed 
Frequency. 

1st  Group 
2nd  Group 

kth  Group 
5th  Group 

yi 

y^ 

yk 
y» 

y^+^yz 

yk+^yk 

y,-^^y, 
i  *  * 

n 

n 

y., 


We 
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Now  although  the  group  frequencies  may  change  relative  to  one 
another,  the  total  sum  of  frequencies  in  all  groups  is  not  affected, 
because  the  n  individuals  of  the  sample  make  up  its  composition  in 
each  case  :  to  keep  n  constant  the  group  frequencies  must  adjust 
themselves  accordingly,  which  explains  the  correlation  between 
them.  Hence  to  compensate  for  an  excess,  Si/^  (assuming  hy^-\-'"^), 
of  frequency  in  any  one  group  there  must  be  a  defect  {—Syj.)  shared 
among  the  other  groups,  and  the  fairest  way  of  sharing  will  be  in 
proportion  to  the  expected  frequencies  in  the  several  groups. 

But  the  total  frequency  divided  between  groups  other  than  the 
A;th  is  (w— 2/fc)»  so  that  the  proportion  of  {—Syjc)  due  to  the  5th  group 
is  VsKn—Vk),  thus 


S2/.=  -^^(-82/,). 


n-Vic 


Therefore, 


^Vk'^Vs^ 


'Vk 
n 


Vs       ^y^ 


Vki  1- 


Vs    §2/^ 


n    a' 


•Vk 


.      (3) 


by  (1). 


FIRST    SAMPLE. 


Size  of  Organ 

or  Character 

observed. 

Frequency  of 
Observations. 

First  Moment. 

Second  Moment. 

X2 

2/2 

Vk 

XkVk 

Ay2 
x\yk 

n 

1{xyy 

2ix^y) 

This  gives  the  product  moment  of  the  deviations  from  yj^  and  yg 
in  one  particular  sample  ;  summing  for  all  such  samples,  remem- 
bering that  by  definitign  the  coefficient  of  correlation  between  ?/^ 
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and  2/s  is  ry^y^=I!{Syji.  •  ^ys)lv(JyGy^,  where  v  is  the  total  number 
of  samples,  also  cr^^  ^ZSy^j^lu,  we  have 


Therefore, 


r     =-l,Ml. 


"   ^^.^y. 


(4) 


gives  the  correlation  required. 

To  find  the  p.e.  of  the  mean  of  a  sample  of  n  observations.  Let  a 
frequency  table  be  drawn  up  in  the  usual  manner  showing  the 
number  of  observations  y^,  y^  .  .  .  corresponding  to  organs  of 
different  sizes  x^,  x^  .  .  . 

The  mean  referred  to  some  fixed  point  as  origin  is  then  given  by 

also  the  mean  square  deviation  of  the  sample  referred  to  the  same 
fixed  point  is  /^^2'  ^^7^  given  by 


and 


m22_M2=(t2 


where  a  is  the  S.D.  of  the  sample. 

For  another  sample  of  the  same  size  the  frequency  distribution 


SECOND    SAMPLE. 


Size  of  Organ 

or  Character 

observed. 

Frequency  of 
Observations. 

First  Moment. 

2/1 +  %i 

2/2+^2/2 

Vk+^Vk 

^liVi  +  ^Vi) 
^2(2/2 +  %2) 

XkiVk  +  ^k) 

■ 
n 

My+^y) 

may  be  slightly  different,  say,  2/1+82/1,  2/2+^2/2,  •  •  •,  and  conse- 
quently the  mean  will  also  be  different,  say, 

U+m=[x,{y,+8y{l+x^{y^+Sy^)+  .  .  .  ]/n, 
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and,  by  subtraction, 

SM=(a;i82/i+a;282/2+  .  .  .)M       •         .         .     (5) 

Now  we  want  to  determine  the  S.D.  of  the  different  values  of  M 
found  among  the  different  samples,  and  that  is  given  by 

where  U  denotes  summation  for  all  samples  and  v  is  the  number  of 
samples.  This  suggests  that  we  should  square  both  sides  of 
equation  (5),  getting 

Therefore,  n^  .  vG\=x\va^yi+  .  .  .  +2x^xJ  -^  .  v]-{-  .  .  ., 
by  (3).     Hence,  making  use  also  of  (1), 

\       n  J  n 

=  (^2/1+  .  .  .  )--(A2/S+  .  .  .  +2^12/1.^22/2+  .  .  .) 
n 

=n^\-l(xiy^^  .  .  0^ 
n 

Thus      G\=(H'\-W)ln=G''ln, 

and  the  probable  error  of  the  mean  ==  0*6745(7/ \/y»      .         .         •     (6) 

The  p.e.  in  the  arithmetic  mean  found  by  taking  a  random  sample 
of  n  events  is  a  measure,  so  to  speak,  of  the  failure  to  hit  the  absolute 
mean,  and  it  follows  that  the  precision  of  the  sample,  the  accuracy 
of  aim  at  the  mean,  would  be  not  unfairly  measured  by  some 
quantity  proportional  to  the  reciprocal  of  the  above  expression, 
namely,  ^/n/0'614:5G.  With  such  a  measure  the  precision  would 
evidently  be  increased  if  the  number  of  observations  in  the  sample 
were  increased,  being  proportional  to  the  square  root  of  their 
number, 

[It  is  desirable  to  draw  a  distinction  here  between  what  have  been 
termed  biassed  errors  and  unbiassed  errors  ;  errors  due  to  random 
sampling  are  of  the  second  class  for  there  is,  by  hypothesis,  no 

[*  We  do  not  know  the  true  mean  for  the  population  as  a  whole,  but  we  take 
in  place  of  it  M,  the  value  given  by  the  sample,  which  we  may  do  with  little 
error  if  n  is  large.     Similarly  c  is  the  S.D.  of  the  sample.  ] 
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reason  why  they  should  be  in  one  direction  rather  than  in  another. 
Biassed  errors,  however,  all  tend  to  be  in  the  same  direction  and 
they  may  arise  in  different  ways,  e.g.  they  may  be  due  to  faults  of 
omission  or  commission  on  the  part  of  the  observer  himself  :  he 
observes  either  carelessly  or  badly,  omitting  certain  factors  which 
ought  to  be  taken  into  account,  or  so  measuring  or  classifying  his 
results  that  they  appear  always  larger  or  less  than  they  really  are 
in  fact. 

Sometimes,  although  the  bias  is  known  to  exist,  it  may  be  im- 
possible to  correct  it :  the  most  one  can  do  is  to  bear  it  in  mind 
and  allow  for  it  in  using  the  results.  A  familiar  example  of  this 
occurs  in  the  collection  of  household  budgets  from  the  poor  to  find 
their  standard  of  living,  where  it  is  only  possible  to  get  particulars 
from  the  more  intelligent  and  thrifty  class  among  them. 

Whereas  in  the  case  of  unbiassed  errors  due  to  random  sampling 
we  can  diminish  the  probable  error  of  the  average  by  increasing 
the  number  of  observations,  the  same  is  not  true  of  errors  which 
are  biassed,  for  suppose  an  error  e  in  excess  be  made  in  each  of 
n  observations  x^^  x^,  -  .  -  x^,  the  effect  upon  the  average  is  to 
increase  it  from 

a?i+^2+   •   •   •   +^n  ^     (^i+e)+(^2+e)-f   •   •  •  +(^n+6) 

to    — ' 

n  n 

i.e.  from 

n  n 

so  that  the  average  is  over-estimated  by  precisely  the  same  amount. 
If,  therefore,  we  know  that  bias  exists,  it  is  well,  if  possible,  to 
correct  it  in  each  observation,  for  by  so  doing  we  change  biassed 
into  unbiassed  errors,  and  though  our  corrections  may  be  somewhat 
wide  of  the  mark,  the  resultant  error  will  then  be  diminished  by 
increasing  the  number  of  observations  :  e.g.  a  farmer  offers  400 
sheep  for  sale  and,  being  anxious  to  make  a  good  bargain,  he  asks 
a  higher  figure  for  them  than  he  is  in  reality  prepared  to  take  ; 
let  us  suppose  that  this  excess  is  2s.  6d.  for  each  sheep,  then  clearly 
the  average  price  per  sheep  at  which  he  is  prepared  to  sell  will  be 
less  than  the  amount  he  asks  by  2s.  6d.  also.  But  now  suppose  the 
buyer,  a  simple  person  knowing  little  of  the  prices  of  sheep  and 
less  of  the  ways  of  men,  goes  through  the  flock  one  by  one  and 
makes  the  error  of  offering  a  price  either  much  above  or  much  below 
what  the  seller  is  prepared  to  take ;    even  if  his  unbiassed  offers 
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differ  by  as  much  as  10s.  for  each  sheep  from  the  seller's  reserve 
price,  so  long  as  they  are  random  in  direction,  i.e.  sometimes  too 
much  and  sometimes  too  little,  the  resultant  difference  in  the 
average  from  what  the  seller  is  prepared  to  take  will  probably  not 
greatly  exceed  f  10s./\/400,  or  4d.  per  sheep. 

We  can  sometimes  diminish  the  effect  of  bias,  even  when  its 
extent  is  unknown,  by  working  with  the  ratios  of  the  quantities 
affected  instead  of  with  the  quantities  themselves  :  e.g.  suppose 
biassed  errors,  61  and  eg,  enter  into  the  measurement  of  the  variables 
Xj^  and  X2,  both  in  excess,  the  ratio  of  the  variables  then 

=  (^i+ei)/(a;2+e2) 

=xJl+'^)/xJ  1  +  . 


X-i    I         \  Xi 


x^X       ; 


-( 1+— )( 1——+ higher  powers  of  eg 


:*'      l  +  il-! 


if  we  omit  higher  powers  of  ej  and  eg  than  the  first  on  the  under- 
standing that  they  are  both  comparatively  small.  Suppose,  for 
example,  there  was  an  error  of  5  per  cent,  made  in  measuring  ic^ 
and  an  error  of  3  per  cent,  of  like  sign  in  measuring  X2  then  the 
resulting  error  in  xjx2  would  be  5  per  cent.  — 3  per  cent. =2  per  cent. 
Clearly  the  same  holds  good  also  if  the  errors  are  both  in  defect. 
This  explains  why  a  comparison  of  results  arranged,  say,  on  the 
index  number  principle  may  be  trustworthy,  although  the  method 
of  formation  of  the  numbers  themselves  may  be  in  some  respects 
faulty,  granted  that  the  same  faults  are  repeated  each  year  so  as 
to  produce  Uke  errors,  i.e.  the  bias  is  to  be  unchanged  in  character. 
To  correct  the  faults  in  one  case  and  not  in  the  other  would  prejudice 
the  success  of  the  method,  since  it  depends  upon  the  errors  counter- 
acting one  another.] 

Example  (1). — To  illustrate  the  important  result  we  have  obtained 
for  the  p.e.  of  the  mean  of  n  observations  let  us  return  to  the  experi- 
ment of  selecting  900  random  digits.  The  distribution  actually 
obtained,  and  the  theoretical  distribution  to  be  expected  in  the 
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long  run  if  the  experiment  were  repeated  several  hundred  times  and 
the  average  taken,  are  shown  in  the  following  table  : — 


Table  (37).  Disteibution  op  900  Random  Digits. 

•Tk-     •* 

Frequency 

Theoretical 

Digit. 

Frequency 

Theoretical 

Digit. 

Observed. 

Frequency. 

Observed. 

Frequency. 

0 

95 

90 

5 

80 

90 

1 

96 

90 

6 

82 

90 

i           2 

93 

90 

7 

72 

90 

1          3 

105 

90 

8 

90 

90 

1          ' 
1 

91 

90 

9 

96 

90 

It  is  a  simple  matter  to  calculate  the  mean  and  S.D.  for  the  dis- 
tribution from  this  table  in  the  usual  way  ;  the  results  are  : — 


Observed  mean =4-38 
Theoretical  mean=4-50 


S.D.=2-911 

S.I).=2-872. 


Thus  the  p.e.  of  the  mean  based  on  the  sample 

=  ±0-6745  x2-911/\/900 
=  ±0065, 

and  4-38  differs  from  4-50  by  less  than  three  times  the  p.e. 

The  36  averages  of  samples  of  25  events  apiece  were  also  calcu- 
lated, and  the  following  were  the  results  obtained : — 

2-76,  3-32,  3-68,  3-72,  3-72,  3-72,  3-76,  3-80,  3-92,  3-92,  408,  412, 
4-16,  4-16,  4-16,  4-28,  4-36,  4-40,  4-40,  4-40,  4-44,  4-60,  4-64,  4-68, 
4-72,  4-72,  4-76,  4-88,  4-96,  500,  5-00,  5-00,  5-08,  5-28,  5-40,  5-72. 

The  mean  of  this  distribution=157-72/36==4-381,  and  the 
S.D.=0-612.  But  the  S.D.  of  the  whole  distribution  of  900  digits 
=2-911,  and  therefore  the  S.D.  of  the  distribution  of  averages  of 
samples  of  25  digits  should  be  2'911/V25=0-582,  differing  from 
0-612  by  about  5  per  cent. 

To  find  the  p.e.  of  the  sum  or  difference  of  two  variables.  Let  the 
mean  values  of  the  two  variables  be  denoted  by  y  and  z,  so  that 
deviations  from  these  values  found  in  a  particular  sample  may  be 
denoted  by  Sy  and  Sz.    If  then  we  write 


u=y-\-z 


we  have 


Su=By+Bz 


(7) 


158  STATISTICS 

To  find  the  S.D.  of  u  we  therefore  require  E{hu^)jv,  where  the 
Z  denotes  summation  for  all  samples  and  v  is  the  number  of  samples. 
But,  squaring  both  sides  of  equation  (7),  we  have 

Thus .  Shu^=Ehy^+Zhz^-\-2E(hyhz), 

where  the  summation  extends  to  all  samples.     Hence 

vg\= va^y+  VG%+  "IvOya^Ty^ 
or  (7\=a^y+c72,+2r,,(7y(7, 

where   r^^   ^^    ^^  correlation   between   the   variables.      And   the 
p.e.=0-6745(7,,. 

The  p.e.  of  the  difference  of  two  variables  follows  at  once  by- 
changing  the  sign  of  z  throughout ;  for,  if 

v=y—z, 

we  have  hv'^=hy'^-\-'hz^—1hy^z, 

and  o-\=o-''y+o-'^,— 2ry,o-yO-,. 

Generally,  if  x-^,  x^,  .  .  .  x^  be  the  mean  values  of  n  variables, 
and  if  ^x-^,  SiCg,  .  .  .  8a;„  denote  deviations  from  these  values  in 
a  particular  sample,  we  may  write 

It — X-^  ~\~  X2  ~i      •    .    .    ~\~Xji 

and  Su=Sxj^-\-Sx2-\-  .  .  .  +S^n. 

Thus  2:Su^=2:Sxi^-i-  .  .  .  +2I(SxiSx2)-i-  .  .  . 

whence  cr\=G\+  .   .   .   -\-2i,^,a^a,^-i-  .  .  . 

Important  Corollary.  If  y  and  z  are  quite  independent  so  that 
Vy^  is  zero,  the  p.e.  of  their  sum  and  the  p.e.  of  their  difference 
have  the  same  value,  namely,  the  square  root  of  the  sum  of  the 
squares  of  the  p.e.'s  of  y  and  z  themselves,  which 

=0-6745-v/(o-%+or\)      .         .         .     (8) 

This  result  is  exceedingly  important,  because  it  can  be  directly 
used  to  test  whether  a  difference  between  two  samples  is  accidental, 
i.e.  whether  it  is  such  as  might  arise  through  sampling,  or  whether 
it  imphes  a  real  difference  between  the  two  populations  from  which 
the  samples  are  selected.  An  example  will  illustrate  the  pro- 
cedure : — 

Example  (2).  In  a  study  of  Minimum  Rates  in  the  Tailoring 
Industry i  by  R.  H.  Tawney,  a  table  is  given  (p.  114)  which  suggests 
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that  *  in  the  north  of  England  women  work  in  the  tailoring  trade 
when  they  are  young  ...  in  London  and  Colchester  they  have 
to  work  when  they  are  older.'  Taking  some  figures  from  that 
table  we  find  : — 


District. 

AVorkers  over 
35  years  old. 

Workers  at 
all  ages. 

Proportion       ' 
over  35.          , 

London  and  Essex 
Manchester  and  Leeds    . 

11,718 
4,029 

35,316 
21,822 

0-332 
0185 

-r  '  >'n 


The  difference  between  the  proportions  over  35  years  of  age 
=  (0-332-0-185)=0-147. 

Let  us  suppose  for  the  moment  that  this  difference  is  not  significant 
of  any  real  difference  in  conditions  between  the  two  districts,  but 
is  merely  due  to  random  sampling.  In  that  case  the  most  natural 
value  to  assign  to  the  true  proportion  of  women  workers  over  35 
for  the  trade  as  a  whole,  as  given  by  these  figures,  would  be 

n^^718+4,029^15J47^^.2^^ 
35,316+21,822    57,138 

The  S.D.  for  the  first  sample  (London  and  Essex)  would  then  be 

0-1=  V(PQM=  \/[0-276  X  0-724/35,316], 

and  for  the  second  sample  (Manchester  and  Leeds)  would  be 

(72=  a/[0-276  X  0-724/21,822]. 

Hence  the  p.e.  for  the  difference  between  the  proportions  in  the 
two  samples  would  be  roughly 

=Wi^\+^%),  by  (8), 

=  f  V[0-276  X  0-724(1/35,316+ 1/21,822)] 

=f  VLO-276  X  0-724/13500] 

=0-0026. 

The  actual  difference  between  the  proportions,  0-147,  being  much 
more  than  3(0-0026),  is  certainly  significant  of  a  greater  difference 
between  the  two  populations  ihan  can  be  explained  by  random 
sampling  alone. 


><U 
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Another  method  of  attack  would  be  to  assume  a  real  difference 
between  the  two  populations,  if  other  considerations  led  us  to 
suspect  such  a  difference,  and  to  find  whether  such  a  difference  could 
be  disguised  by  random  sampUng.  In  that  case  the  proper  pro- 
portion to  assume  for  the  first  sample  would  be  0-332,  giving 

ai=  V[0-332  X  0-668/35,316]=  V628/10^ 

and  for  the  second  sample  the  proportion  would  be  0-185,  giving 

(72=  ^[0-185  X  0-815/21,822]=  V^Ol/lO^. 

Hence  the  p.e.  for  the  difference  between  these  two  proportions 
due  to  random  sampling  would  be 

=  IVK'+^2'),  by(8), 
=  |^^y(628+691) 
=0-0024. 

The  actual  difference  is  0-147,  which  certainly  could  not  be  out- 
balanced by  an  error  in  the  opposite  direction  due  to  random 
sampling,  because  it  is  much  more  than  three  times  the  probable 
error  due  to  sampling. 

Sometimes  we  have  to  test  the  difference,  not  between  two 
simple  proportions,  but  between  two  sample  distributions.  In 
that  case  the  mean  of  each  sample  may  be  calculated  so  that  the 
difference  (M^— Mg)  between  the  means  is  known  ;  to  find  out 
whether  or  not  it  is  significant  of  some  real  difference  between  the 
two  populations  from  which  the  samples  are  drawn,  (Mj— Mg) 
is  compared  with  its  p.e.,  namely 

0-6745V(ct2mi+^Im2), 
or  0-6745  V(o-\/r^i+c7\/^2)      •  •  -     (9) 

where  Ui  and  n^  are  the  numbers  of  observations  in  the  two  samples 
respectively,  and  g^,  g^  are  the  S.D.'s  of  the  samples.  Unless 
(Mj— Mg)  is  definitely  greater  than  some  two  or  three  times  this 
expression  we  cannot  be  very  sure  that  the  difference  between  M^ 
and  Mg  may  not  have  arisen  merely  through  random  sampling, 
and  it  may  quite  Ukely  not  be  significant  *  of  any  real  difference 
between  the  two  populations  as  regards  the  organ  or  character 
which  is  under  consideration. 

[*  It  should  be  observed  that  the  S.D.  provides  a  wider  margin  for  significance 
than  the  p.e.,  because  a  range  of  approximately  3  p.e.  =3'§(r  =  2o-  onl3^  It  is 
quite  safe  therefore  to  attach  no  great  significance  to  a  difierence  which  does 
not  exceed  two  or  three  times  the  p.e.] 
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Example  (3). — Statistics  have  been  collected  to  test  whether  there 
is  any  significant  difference  between  the  eggs  laid  in  general  by- 
cuckoos  and  those  laid  by  them  in  the  nests  of  particular  species 
of  foster  parents.  Results  of  the  following  kind  were  obtained 
[see  Biometrika,  vol.  iv.,  pp.  363-373,  The  Egg  of  Cuckulus  Canorus 
(2nd  Memoir),  by  0.  H.  Latter] : — 


Number 

Mean 

S.D. 

(mms.) 

Signi- 

Group. 

of 

Length 

ficance 

Remarks. 

Eggs. 

(mms.) 

Test. 

Eggs  of  the  Cuckoo 

race  in  general 

1572 

22-3 

0-9642 

.  . 

, . 

Eggs  laid  in  nests  of — 

Garden  Warbler     . 

91 

21-9 

0-7860 

7-0 

Significant. 

White  Wagtail       . 

115 

22-4 

0-7606 

1-6 

Not  significant. 

Hedge  Sparrow 

58 

22-6 

0-8759 

3-75 

Probably  significant 

The  diJfference  between  the  mean  lengths  of  eggs  laid  in  the  nests 
of  garden  warblers  and  those  laid  by  cuckoos  in  general 

=22-3— 21-9-:0-4  mms. 
The  p.e.  of  this  difference 

=0-6745 V[ (0-7860)2/91+ (0-9642)2/1572],  by  (9), 

=0-6745^(0-007380) 

=0-058. 

Hence  the  significance  test 
=0-4/0-058=7-0, 

and  we  conclude  that  the  difference  in  length  between  the  two 
classes  of  eggs  is  certainly  significant.  Similarly  the  other  cases 
may  be  tested. 

In  the  example  just  given,  to  find  out  whether  one  population 
differed  from  another,  the  arithmetic  means  have  been  compared  ; 
but  the  mean  alone  will  scarcely  serve  to  establish  the  identity  of 
any  population.  For  example,  we  can  conceive  of  two  distinct 
races  of  men,  both  of  the  same  mean  height,  but  one  race  embracing 
a  number  of  giants  and  dwarfs.  Of  course  if  we  agreed  to  define 
two  races  as  identical  when  they  have  the  same  mean  heights,  there 
would  be  nothing  more  to  be  said,  but  that  would  certainly  only 
be  a  very  rough-and-ready  attempt  at  classification. 

Taking  into  consideration  only  the  character  of  height,  a  further 
step  in  definition  would  be  to  measure  the  mode  or  most  fashionable 

L 
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height,  and  the  dispersion  or  variabiHty — absolute  :  the  standard 
deviation,  and  relative  :  the  coefficient  of  variation — of  the  two 
races.  Then,  after  comparing  heights  with  sufficient  detail,  the 
attention  could  be  turned  to  innumerable  other  characters,  skull 
and  body  measurements,  physical,  mental,  and  even  moral 
attributes. 

Clearly  the  difficulty  of  definition  and  of  establishment  of  identity 
grows  as  we  pass  along  the  scale  from  physical  to  moral.  Moreover, 
other  statistical  constants  must  be  requisitioned  when  the  question 
of  the  existence  and  degree  of  relationship  between  two  organs  or 
characters  is  to  be  determined.  As  the  S.D.  and  the  C.  of  V.  serve 
to  measure  the  amount  of  variability,  so  the  coefficient  of  correlation 
comes  in  to  measure  the  amount  of  likeness  or  association.  Further, 
and  especially  in  problems  of  inheritance,  the  coefficient  of  regres- 
sion must  be  measured.  It  might  seem  at  first  sight  hopeless  to 
try  and  measure  the  correlation  between  two  such  characters  as 
athletic  capacity  and  health  in  the  same  boy,  or  between  the 
truthfulness  of  one  boy  and  that  of  his  brother  ;  but  the  genius  of 
Karl  Pearson  has  gone  some  way  to  solve  even  this  difficult  problem 
by  means  of  a  system  of  adjectival  instead  of  numerical  classifica- 
tion [see  Phil.  Trans.,  vol.  195a,  pp.  1-47,  On  the  Correlation  of 
Characters  not  Quantitatively  Measurable,  and,  as  an  exceptionally 
interesting  application  of  the  method,  see  Pearson,  On  the  Laws  of 
Inheritance  in  Man,  ii.  ;  On  the  Inheritance  of  the  Mental  and  Moral 
Characters  in  Man  and  its  Comparison  with  the  Inheritance  of  the 
Physical  Characters;  Biometrika,  vol.  iii.  pp.  131-190].  In  short, 
for  a  full  and  exact  definition  of  a  population  of  any  kind,  human 
or  otherwise,  it  is  necessary  to  measure  not  only  the  means,  but  aU 
the  more  important  statistical  constants,  modes,  medians,  S.D.'s, 
C.'s  of  v.,  coefficients  of  correlation  and  regression,  and  so  on,  and 
it  is  no  less  necessary  to  calculate  also  their  probable  errors  if  we 
are  to  test  the  real  significance  of  such  differences  as  are  observed 
in  these  constants  between  two  samples  from  the  same  or  from 
different  populations. 

The  probable  errors  for  the  more  important  constants,  some  of 
which  are  only  introduced  later  in  the  book,  are  collected  together 
in  Table  (38)  for  reference.  The  proofs  in  general  are  a  little  intricate 
and  would  be  lacking  in  interest  to  the  ordinary  person,  who  is 
satisfied  to  take  algebraical  analysis  on  trust  so  long  as  he  under- 
stands the  nature  of  the  results  he  uses,  but  the  more  mathematical 
reader  who  is  anxious  to  see  proofs  may  refer  for  some  of  them  to 
Biometrika,  vol.  ii.,  pp.  273-281,  Editorial,  On  the  Probable  Errors 
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0/  Frequency  Constants,  which  has  been  freely  consulted  on  the 
subject  here. 

The  usual  notation  is  adopted,  n  being  the  total  number  of 
observations  in  the  given  distribution,  supposed  normal  in  general, 
o-  the  S.D.,  etc. 


Table  (38).  Probable  Errors  of  Statistical  Constants. 


statistical  Constant. 

Probable  Error  (=0-6745  S.D.). 

1 

Any  observed  group  frequency,  y 
;  The  mean  of  a  distribution  of  any  type 
'  The  S.D.  of  a  normal  distribution,  o-    . 
[The  second  moment  about  the  mean,  n^ 
.    „    third           „            „            „          Ms      • 
[  „    fourth         „           „            „        ^4      . 

The  coefficient  of  variation,  v     . 

The  coefficient  of  correlation,  r 

The  correlation  ratio,  »/.... 

f  X,  as  determined  from  (X-X)=r^(Y- Y), 

ay 

1     when  Y  is  given 

Y,  as  determined  from  {Y-Y)=r^(X-X), 

^    when  X  is  given 

Distance  between  mode  and  mean  in  a  skew 
distribution  .          .          .          . 

Skewness 

^2  (which  should  =  3  for  a  normal  distribution) 

'  ^i(     »»         »»      =0        „                 „             ) 

VW^          . 

0-6745j<  V[y{l-y/n)] 

cr/Vn 

a/V2n 

aW2fn 

aW9Q/n 

„        {l-r^)/Vn 

{l-r]^)/Vn,  nearly 

(rV(3/2n) 

„        V(3/27i) 
„        V(24/7i) 
»        0 
»        V(6/n) 

Example  (4). — Li  the  example  which  follows  are  given  data 
necessary  for  testing  the  significance  of  differences  in  variability 
as  well  as  in  mean  values.  They  represent  an  attempt  made  to 
find  whether  members  of  a  particular  species  of  crab  caught  in 
shallow  water  differed  with  regard  to  certain  characteristics  from 
those  caught  in  comparatively  deep  water  [see  Biometrika,  vol.  ii., 
pp.  191  et  seq.,  Variation  in  Eupagurus  Prideauxi,  by  E.  H.  J. 
Schuster].  Only  a  few  of  the  results  are  recorded  here,  to  two 
decimal  places  ;  the  reader  wiU  find  it  a  valuable  exercise  to  verify 
for  himself  the  p.e.'s  given  in  each  case. 
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Measurement  Made. 

Sex. 

Locality. 

Mean  (mm.). 

S.D.  (mm.). 

C.  ofV. 
per  cent. 

Carapace  length 

Male 
Female 

55 

Deep  water 
Shallow   ,, 
Deep 
Shallow    „ 

8.59±0-05 
841  ±0-04 
7-54±0-03 
7-12±0-02 

l-67±0.04 
149±0-03 
0-94db0-02 
0.86±0.02 

1945db044 
17-75±0-37 
1249±0-28 
1212±0-25 

Difference  of  Means  (mm,). 

Difference  of  S.D.'s  (mm.). 

Difference  of  C.'s  of  V. 
per  cent. 

Sex. 

0-18±0-07(poss.  sig.) 
042iO-04(sig.) 

0-18db0-05{prob.sig.) 
0-08±0-03(poss.sig.) 

1.70±0.58(poss.  sig.) 
0-37±0-37(not8ig.) 

Male 
Female 

The  significance  or  otherwise  of  differences  between  variabiUties 
in  the  case  of  cuckoos'  eggs  (p.  161)  might  be  tested  in  the  same  way. 


CHAPTER  XIV 


FURTHER   APPLICATIONS    OF    SAMPLING   FORMULA 


We  have  been  discussing  in  the  last  chapter  how  to  test  two  samples, 
supposed  each  to  contain  homogeneous  material,  to  find  out  whether 
they  belong  to  the  same  or  to  different  types  of  population,  but 
the  further  question  often  arises  as  to  whether  a  sample  is  or  is  not 
homogeneous.  t 

Example  (1). — To  this  we  may  obtain  a  partial  answer  by  working 
out  the  statistical  constants  of  the  sample  and  their  p.e.'s  in  order 
to  compare  them  with  the  corresponding  constants  for  a  sample  or 
series  of  samples  believed  to  be  homogeneous  and  of  the  same 
type.  For  example,  Professor  Karl  Pearson  has  measured  the 
skulls  of  skeletons  of  the  Naqada  race,  excavated  in  Upper  Egypt 
by  Professor  Flinders  Petrie  and  presumed  to  be  some  8000  years 
old,  and  he  places  his  results  for  comparison  alongside  those 
for  certain  other  races  admittedly  homogeneous  [see  Biometrika, 
vol.  ii.,  p.  345,  Homogeneity  and  Heterogeneity  in  Collections  of 
Crania] : — 


Variability  (mm.). 

Series. 

Number  of 

Observations. 

Skull  Length. 

Skull  Breadth. 

!               /"Ainos 

76 

5-936 

3-897 

Bavarians    . 

100 

6-088 

6-849 

Skulls  J 

Parisians 

77 

5-942 

5-214 

Naqadas 

139 

5-722 

4-612 

lEngUsh 

136 

6085 

4-976 

Living  r ^^"i^ridge  undergrad'tes 
heads   ^^gl^^^  criminals 

tOraons  of  Chota  Nagpur 

1000 

6-161 

6-055 

3000 

6-046 

6-014 

100 

5-916 

4-397 

Mean  Variability 

5-987 

4-877 

166 
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The  S.D.  of  the  variabihty  of  skull  length  calculated  from  this 
series=0-129  mm.  and  of  the  variabihty  of  skull  breadth=0-545  mm., 
and  these  supply  standards  for  valuing  the  differences  between  the 
Naqada  and  the  mean  variabilities. 

Another  method  of  procedure  is  to  take  a  random  sample  out  of 
the  sample  itself,  assuming  the  latter  is  large  enough  to  admit  of 
an  adequate  sub-sample,  and  to  compare  the  constants  of  the 
whole  and  jjart.  When  they  do  not  differ  beyond  the  Hmits  allowed 
by  random  sampling  the  inference  is  that  the  whole  may  be  treated 
as  a  homogeneous  class  if  judged  by  this  test  alone. 

Example  (2). — In  an  interesting  and  important  memoir,  On 
Criminal  Anthropometry  and  the  Identification  of  Criminals,  by  W.  R. 
Macdonell  [Biometrika,  vol.  i.,  pp.  177  et  seq.],  the  author  uses  this 
method  to  test  the  homogeneity  of  a  class  of  3000  criminals  by 
measuring  also  a  random  sample  of  1306  ciiminals  out  of  the  3000. 
He  obtained,  for  example, 

S.D.  of  head  length-- 6-04593±0-05265  mm.,  for  the  3000  criminals  ; 
=  600247 ±007922    „  „       1306 

The  difference  between  the  variabilities  in  the  sample  and  sub- 
sample,  by  result  (8)  on  p.  158, 

=0-04346±V  [(0-05265)2+ (0-07922)2] 
=  004346+009512 

which  is  certainly  not  significant.  If  the  same  holds  good  with 
regard  to  the  means  and  other  constants,  then  the  whole  may  be 
said  to  be  homogeneous  so  far  as  this  test  goes. 

Example  (3). — ^Another  example  may  be  given  from  the  memoir 
on  Variation  and  Correlation  in  Brain  Weight,  by  Raymond  Pearl, 
[Biometrika,  vol.  iv.,  pp.  13  e^  seq.].  The  author  wished  particularly 
to  investigate  the  change  of  brain  weight  with  age  ;  on  the  hypo- 
thesis that  the  weight  of  the  brain  reaches  a  maximum  between 
the  ages  of  15  and  20,  remains  unchanged  from  20  to  50,  and  then 
begins  to  decline  and  so  continues  till  death,  the  material  was 
divided  into  a  *  Young  '  series,  ages  20  to  50,  and  a  '  Total '  series 
including  all  between  20  and  80.  The  '  Young  '  series  thus  formed 
a  selection  from  the  '  Total '  series,  but  a  selection  based  on  age 
and  not  on  brain  weight.  If  there  were  no  correlation  between 
age  and  brain  weight,  this  selection,  based  as  it  is  on  age,  would, 
of  course,  be  random  as  regards  brain  weight.  Now  correlation 
does  exist  between  the  two,  but  it  is  so  slight  that,  within  the  hmits 
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of  error,  the  '  Young  '  series  does  form  practically  a  random  sample 
of  the  '  Total '  series,  as  is  shown  by  the  following  figures  : — 

Difference  in  Variation  Constants  between  Young  and 
Total  Series  (written  with  a  positive  sign  when  the 
Young  Series  gives  the  greater  value). 


Male. 

Female. 

Swedes 
Bavarians 

S.D. 
+2-851+4-066 
-1-888+3-556 

C.  of  V. 
+0-122+0-291 
-0-173+0-234 

S.D. 
+  4-786+5-465 
-10-357  +  3-909 

C.  of  V. 
+0-271  +  0-435 
-0-941+0-320 

Thus  in  only  one  case,  that  of  the  Bavarian  females,  is  the  differ- 
ence between  the  variabilities,  S.D.  or  C.  of  V.,  of  the  two  series  as 
great  as  its  probable  error,  and  even  in  that  case  the  differences, 
10-357  and  0-941,  are  not  three  times  as  large  as  their  respective 
p.e.'s,  3-909  and  0-320.  Dr.  Pearl  concludes  from  these  and  similar 
results  that '  the  series  are  reasonably  homogeneous  in  other  respects 
than  age.' 

The  reader  is  recommended  to  test  his  knowledge  of  the  formulae 
for  probable  errors  by  applying  them  to  the  following  examples. 
Dr.  Alice  Lee,  in  a  note  on  Dr.  Ludwig  on  Variation  and  Correlation 
in  Plants  [Biometrika,  vol.  i.,  p.  316]  makes  use  of  the  statistics 
relating  to  Ficaria  Vema  in  Example  (4).  Those  in  Example  (5) 
are  taken  from  among  a  large  number  of  others  in  the  highly 
interesting  memoir,  On  the  Laws  of  Inheritance  in  Man,  by  Professor 
Karl  Pearson  and  Dr.  Alice  Lee  [Biometrika,  vol.  ii.,  pp.  357  et  seq.] 
cited  once  before. 


Example  (4). — Variation  and  Correlation  in  Ficaria  Verna. 


•     No.  of  Observations. 

Mean  No.  of 
Petals;  S.D. 

Mean  No.  of 
Sepals;  S.D. 

Correlation  between 

No.  of  Sepals  and 

No.  of  Petals. 

1000  (Greiz  A) 
1000  (Greiz  G) 

8-286;   1-3382 
8-232;  0-9954 

3-695;   0-8524 
3-437;   0-7033 

0-2439+0-0201 
0-2480+0-0200 

We  have  here  all  the  data  necessary  to  find  the  p.e.'s  of  the 
means,  variabilities,  and  correlations,  and  we  wish  to  know  whether 
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the  differences  between  the  means  and  variabilities  of  the  A  and  G 
plants  can  be  accounted  for  by  random  sampling  alone. 
For  example,  the  difference  between  the  petal  means 

=  (8.286-8.232)±i    /[ (1:3382)^     (0;9954n 
\j[_    1000  1000    J 

=0054±0035. 

Clearly  this  difference,  being  not  so  great  as  twice  its  p.e.,  is  not 
significant  and  may  quite  well  be  due  to  random  sampling. 
Again,  the  difference  between  the  petal  variabilities 


-(l-3382-0-9954)±f 
=0-3428±0025 


\ 


(1-3382)2  ,  (0-9954)2 


2000 


2000 


which  is  certainly  much  too  great  to  be  explained  away  by  random 
sampUng  merely. 

Similarly  the  differences  between  the  sepal  means,  between  the 
sepal  variabilities,  and  between  the  correlations,  may  be  tested  for 
significance  by  comparison  with  their  p.e.'s. 

Example  (5). — Size  and  Variability  of  Stature  in  the 
Two  Generations. 


Father. 

Mother. 

Son. 

Daughter. 

Mean  height  (in.) 

S.D.  (in.)  . 

C.  of  V.  (percent.) 

67-68±0-06 
2-70  ±0-04 
3-99±0-06 

62-48  ±0-05 
2-39±004 
3-83±0-06 

68-65  ±005 
2-71  ±004 
3-95±006 

63-87  ±005 
2-61  ±0-03 
4-09  ±005 

The  student  in  this  case  might  use  one  of  the  formulae  for  the 
p.e.'s  to  find  the  number  of  fathers,  mothers,  sons,  or  daughters 
observed  when  the  p.e.'s  are  known,  and  then  the  remaining  p.e.'s 
might  be  verified  when  the  numbers  of  observations  are  found. 

As  evidence  of  '  assortative  mating,'  the  tendency  of  like  to 
mate  with  -like,  the  following  particulars  are  given,  based  on  1000 
to  1050  cases  of  husband  and  wife  : — 


Correlation  between  stature  of  husband  and  stature  of  wife=0-2804±0-0189 

„       span        „  „         „    span       „      ,,  =0-1989±0-0204 

,,  ,,       forearm  ,,  „         ,,    forearm ,,      „  =0-1977±0-0205 


To  measure  the  average  intensity  of  inheritance,  the  extent  of 
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resemblance  between  parents  and  children  in  any  character, 
efficients  of  correlation  are  calculated  such  as  the  following  : — 


co- 


Coefficient  of  Correlation 

between  stature  of  father  and  stature  of  son  =0-514db0-015 

,,       ,,         ,,  „        „    daughter =0-510±0-016 

,,  mother  ,,  ,,        „   son  =0494±0-016 

,,      ,,        ,,  ,,        ,,   daughter =0-507±0-016 


[In  verifying  the  p.e.'s  for  this  case  take  the  number  of  observa- 
tions to  be  1024.] 

One  more  extract  may  be  quoted,  a  prediction  table,  giving  the 
probable  mean  stature  of  sons  of  fathers  of  given  stature,  and 
so  on  : — 


Son's  probable  stature  =  33-73  +  0"516  (father's  stature)  ±  1  '56 

Daughter's       „  „      =  30-50  + 0*493  (       „  „      )±1-51 

Soq's  „  „      =33-65 +  0-560  (mother's stature)  ±1-59 

Daughter's       „  „      =29-28  +  0-554  (       „  „      )±l-52. 


All  values  given  in  this  example  for  the  p.e.'s  should  be 
verified. 

Before  we  consider  further  applications  of  these  principles  to 
questions  of  a  somewhat  different  kind,  let  us  imagine  a  very 
simple  though  artificial  illustration.  Suppose  we  have  999  sheep, 
each  one  ticketed,  the  numbers  on  the  tickets  running  from  1  to 
999.  Also  suppose  666  of  these  sheep  are  white  and  333  are  black, 
so  that,  if  we  pick  out  any  one  at  random,  the  chance  of  it  being 
black  is  333/999  or  1/3.  Let  us  call  picking  a  black  sheep  a  '  success,' 
then  :p=  1/3,  g=2/3. 

We  proceed  now  to  select  99  sheep  in  succession  at  random 
from  the  flock  with  the  understanding  that  each  sheep  is  returned 
into  the  flock  before  the  next  is  picked  out.  This  insures  that 
the  chance  of  a  success  at  each  selection  remains  equal  to  1/3  and, 
of  course,  there  is  nothing  to  prevent  the  same  sheep  being  picked 
more  than  once.  The  selection  might  practically  be  made  by 
placing  in  a  box  999  tickets,  numbered  from  1  to  999,  one  to  corre- 
spond to  each  sheep,  then  picking  out  99  of  them  in  succession, 
being  careful  to  replace  each  and  to  shake  up  the  box  before  picking 
out  the  next ;  if  there  were  absolutely  no  difference  between  the 
tickets,  such  as  would  cause  one  to  be  picked  more  easily  than 
another,  the  selection  made  in  this  way  would  be  random  in  the 
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sense  required,  and  the  tickets  so  chosen  would  determine  which 
sheep  were  to  be  taken  and  which  left. 

The  proportion  of  black  sheep  to  be  expected  in  such  a  random 
selection  of  99  is  1/3,  but,  if  we  only  perform  the  experiment  once, 
it  is  quite  Ukely  that  the  proportion  we  actually  get  will  differ  from 
1/3  by  an  amount 

=0-6745V(2??/w) 

=0-6745V(J  .  §  .  A) 

=  1/31,  about, 

while  it  is  unUkely  that  the  proportion  will  differ  from  1/3  by  much 
more  than  3/31,  or  1/10. 

Conversely — and  it  is  really  the  converse  which  is  useful  in  prac- 
tice— if  we  do  not  know  the  proportion  of  black  sheep  in  the  whole 
flock,  we  may  get  a  fair  estimate  of  it  by  taking  a  random  sample 
of  99  sheep  (any  other  number  wiU  serve  the  purpose,  but  the 
larger  the  better  for  accuracy),  and  if  we  find  that  in  this  sample 
there  are  33  black  sheep,  i.e.  ^=33/99=1/3,  it  will  appear  that 
the  value  of  jp  for  the  whole  flock  is  1/3,  subject  to  a  probable  error 
0-6745\/(2?9'/^)  in  excess  or  defect,  i.e.  the  true  proportion  for  the 
whole  flock  may  quite  likely  differ  from  1/3  by  as  much  as  1/31, 
but  it  is  unlikely  to  differ  by  much  more  than  1/10.  It  should  be 
noticed  that  the  calculation  of  the  probable  error  in  this  converse 
case  is  based  upon  the  value  of  p  given  by  the  sample  taken,  for 
that  is  the  only  value  of  which  we  have  knowledge. 

Too  much  stress  can  scarcely  be  laid  on  the  fact  that  the  samples 
chosen  must  be  absolutely  unbiassed,  otherwise  the  use  of  the 
formulae  Tfp  and  ^/(npq),  or  the  corresponding  proportional  formulae, 
cannot  be  justifled  :  each  sheep  in  our  illustration  must  have  the 
same  chance  of  being  picked,  and  no  one  selection  is  to  have  any 
influence  on  another.  The  failure  to  appreciate  this  essential 
point  has  led  to  no  little  waste  of  time  and  effort  in  the  collection 
of  valueless  statistics. 

The  method  of  sampling  has  been  employed  in  a  way  at  once 
interesting  and  useful  by  Dr.  A.  L.  Bowley,  and,  as  some  of  this 
work  has  barely  received  the  attention  it  deserves,  it  may  be  well 
to  explain  two  of  his  experiments  in  some  detail. 

The  first  was  of  interest  because  its  results  could  be  tested  by 
an  examination  of  the  original  record  from  which  the  sample  was 
taken.  The  details  concerning  it  are  abstracted  from  the  Journal 
of  the  Royal  Statistical  Society,  September  1906. 

Example    (6). — Bowley   sampled   the   dividends   paid   by   3878 
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companies  as  quoted  in  the  Investors'  Record.  His  sample  con- 
sisted of  400  of  these  companies,  i.e.  about  10  per  cent.,  selected  in 
a  purely  arbitrary  fashion  thus  :  the  investigator  took  a  Nautical 
Almanac  and  noted  down  the  last  digits  of  one  of  the  tables,  record- 
ing them  in  groups  of  four,  but  if  any  particular  group  gave  a 
number  bigger  than  3878  he  rejected  it.  In  this  way  each  of  the 
numbers  between  1  and  3878  had  an  equal  chance  of  selection  (for 
numbers  under  four  figures  would  appear  like  0327,  0042,  0009, 
which  would  be  taken  to  represent  327,  42,  9  respectively),  and  the 
selection  of  one  had  no  influence  on  that  of  any  other.  The  com- 
panies in  the  Investors'  Record  were  numbered  consecutively,  and 
the  dividends  corresponding  to  the  400  arbitrary  numbers  obtained 
formed  the  sample  with  which  Bowley  worked. 

After  making  some  interesting  deductions  with  regard  to  the 
average  for  the  whole  distribution,  to  which  we  shall  return  pre- 
sently, he  proceeded  to  forecast  the  grouping  of  the  original  com- 
panies as  to  their  dividends  by  setting  out  the  grouping  discovered 
in  the  sample  400,  as  follows,  using  the  standard  deviation  in  place 
of  the  probable  error  as  the  error  due  to  randorii  sampling  : — 

Table  (39).  Distribution  of  Dividends  paid  by  a 
Sample  of  400  Companies. 


(1) 

(2) 

(8) 

(4) 

Dividend. 

Sample  of 

400 
Companies. 

Percentage  of  Sample 
Companies  in  each  Class. 

Percentage  of 
all  Companies 
in  each  Class. 

Nil 

£1  to  £2,  19s.  9d. 
£3  to  £3,  9s.  9d. 
£3,  10s.  to  £3,  19s.  9d. 
£4  to  £4,  9s.  9d. 
£4,  10s.  to  £4,  19s,  9d. 
£5  to  £5,  19s.  9d. 
£6  to  £7,  19s.  9d. 
£8  to  £10,  19s.  9d. 
Ab6ve£ll 

28 
6 
37 
71 
64 
53 
60 
48 
29 
4 

7  with  S.D.  =  l-27 

n 

9i        „         =1-46 
171        „         =1-90 
16        „         =1-83 
13J        „         =1-68 
15        „         =1-78 
12        „         =1-63 
7i        „         =1-29 
1 

6 
1-5 

8-4 
18-8 
17-3 
13-8 
17-7 
10-8 
3-8 
1-9 

In  col.  (3)  the  S.D.  for  each  group  was  calculated  as  follows  : — 
for  the  first  group  :  out  of  400  possible  events  we  have  28  successful 
events,  meaning  by  '  successful '  here  *  a  company  paying  no 
dividend,'  thus 

^=28/400,        3=372/400. 
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Hence  the  S.D.  of  the  frequency  in  the  first  group 

=  V(28x372)/20 
=5-l. 

Since  this  is  for  a  sample  of  400,  the  S.D.  of  the  ^percentage  *  frequency 
in  the  first  group 

-J(5-l)--l-27. 

The  other  S.D.'s  are  calculated  in  the  same  way,  but  when  the 
number  in  a  class  is  very  small  the  forecast  can  scarcely  be  refied 
upon  and  consequently  the  S.D.  is  not  inserted. 

It  will  be  noted,  by  comparing  with  the  numbers  in  col.  (4), 
showing  the  corresponding  percentages  for  aU  the  3878  companies, 
that  every  forecast-  was  remarkably  good  except  one,  class  £8  to 
£10,  19s.  9d.,  where  the  error  approaches  three  times  the  S.D.,  and 
the  exception  will  serve  as  a  warning  that,  in  working  with  samples, 
the  unexpected  sometimes  happens.  Professor  Edge  worth,  in  his 
Presidential  Address  to  the  Royal  Statistical  Society  (1912),  points 
out  that  the  method  appears  to  be  a  permanent  institution  in 
the  Statistical  Bureau  at  Christiania,  where  it  has  given  very  good 
results.  These  can  be  checked  or  '  controlled  '  for  safety  if  complete 
statistics  are  obtainable  under  some  heads.  He  fairly  sums  up  the 
utility  of  sampling  when  he  says  that  '  we  may  obtain  from  samples 
a  general  outline  of  the  facts — often  sufficient  for  the  initiation  of 
a  project  like  that  of  insurance — ^rather  than  the  features  in  detail.' 

Bowley  also  divided  up  his  400  random  samples  into  40  groups 
of  10  companies  each,  and  calculated  the  average  for  each  group. 
The  S.D.  for  these  40  averages  was  found  in  the  usual  way,  giving 
0-775.  But  since  this  was  the  S.D.  for  averages  of  10,  we  conclude 
that 

(theS.D.forthedistributionofthe400companies)/\/10=0-775 
i.e.       the  S.D.  for  the  distribution  of  the  400  companies =0-7 75-^^10. 

Hence,  appl.ying  the  same  principle  again, 

the  S.D.  of  the  average  of  the  400  sample  companies 

-0-775V10/\/400 
=£0122. 

[*  It  would  not  be  correct  to  take  \/[7(l  -iItt)]  as  the  S.D.  of  the  percentage 
frequency  in  the  first  group  ;  this  value  would  be  double  the  true  value,  namely, 
J  v''[28(l  -TW)]  =  i  v''[7(l  -T^ir)],  because  the  accuracy  is  increased  by  increasing 
the  number  of  events  in  a  sample,  and  the  sample  here  is  really  400  and  not  100.  J 
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Now  the  average  of  the  400  samples  turned  out  to  be  £4-7435. 
Hence  it  was  judged  that,  if  this  was  a  fair  selection  (and  the  random 
method  adopted  was  such  as  to  make  it  fair  in  all  reasonable  likeli- 
hood), the  average  for  the  3878  companies  should  certainly  lie 
between 

£[4-7435±3(0-122)]. 

The  true  average  was  found  by  actual  calculation  to  be  £4*779, 
well  within  the  above  limits,  although  the  original  items  varied  from 
nil  to  £103,  being  grouped  according  to  the  nature  of  the  security 
— Government,  Railways,  Mines,  etc.,  etc.,  and  the  averages  and 
S.D.'s  on  successive  pages  di£fered  materially.  This  aggregation, 
Bowley  remarks,  is  very  similar  to  that  found  in  wages  in  different 
occupations  and  localities,  and  in  many  other  practical  examples. 

The  value  of  the  second  experiment  due  to  Dr.  Bowley  lies  in  the 
suggestion  that  similar  means  can  be  applied  with  good  results  to 
the  investigation  of  many  social  phenomena. 

If  out  of  a  large  group  a  comparatively  small  sample  of  statistics 
is  collected  in  the  purely  random  manner  already  described,  we  are 
able  by  such  means  to  estimate  what  is  the  average,  and  even  to 
obtain  limits  between  which  the  average  wiU  almost  certainly  lie, 
in  the  large  group  based  upon  values  found  for  the  average  and 
S.D.  in  the  small  sample. 

Example  (7). — With  the  collaboration  of  Mr.  Burnett-Hurst  and 
a  number  of  other  workers.  Dr.  Bowley  conducted  an  inquiry  into 
the  conditions  of  working-class  households  in  four  representative 
towns — ^Northampton,  Warrington,  Stanley,  and  Reading — ^the 
results  of  which  are  published  by  Messrs.  Bell  and  Sons  under  the 
title  of  Livelihood  and  Poverty.  They  are  similar  in  character  to 
those  obtained  by  Rowntree  in  his  study  of  conditions  in  York, 
but  what  is  peculiar  to  Bowley's  inquiry  is  that  only  a  sample, 
about  1  in  20,  of  the  working-class  houses  in  each  town  was 
examined,  and  the  conditions  in  the  towns  as  a  whole  were  deduced 
from  these  samples. 

We  are  not  concerned  here  with  the  actual  facts  disclosed  by  the 
investigation,  striking  as  they  are,  but  with  the  explanation  of  the 
sampling  method  adopted,  and  as  to  that  it  may  be  remarked  that 
the  foundation  on  which  it  rests  is  precisely  the  same  as  that  which 
underlay  the  example  of  the  999  black  and  white  sheep.  The 
main  point  to  notice  here  again  is  that  Bowley  was  careful  to  select 
his  samples  in  unbiassed  fashion  as  follows  :  '  For  each  town  a  list 
of  all  -houses  .  .  .  was  obtained,  and  without  reference  to  anything 
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except  the  accidental  order  (alphabetical  by  streets  or  otherwise^ 
in  the  list,  one  entry  in  twenty  was  ticked.  The  buildings  sa 
marked,  other  than  shops,  institutions,  factories,  etc.,  formed  th^ 
sample.'  It  will  be  evident  that  this  method  of  choice  is  not  quit© 
on  the  same  level  of  randomness  as  that  followed,  for  example,  m 
drawing  cards  from  a  well-shuffled  pack,  each  card  to  be  replaced 
and  the  pack  reshuffled  before  the  next  is  drawn  ;  but,  for  that 
very  reason,  the  results  of  the  experiment  are  all  the  more  likely 
to  be  well  within  the  limits  of  error  provided  by  the  formulae  o| 
the  ideal  case.  The  deliberate  selection  of  every  twentieth  hous^ 
in  each  street  is  likely,  that  is  to  say,  to  give  a  more  representative 
picture  of  the  town  as  a  whole  than  would  be  obtained  by  selecting 
the  same  number  of  houses  in  a  purely  random  fashion  which  might 
by  chance  give  too  much  emphasis  to  some  street  or  district. 

A  practical  test  of  the  goodness  of  the  sample  was  possible  by 
comparing  the  results  in  a  few  instances  with  information  available 
from  other  sources.  In  order  to  make  the  method  of  working 
quite  clear,  let  the  guiding  principle  first  be  recalled  : — 

*  If ,  in  a  random  sample  of  n  items,  the  proportion  of  successes 
is  p,  then  the  proportion  of  successes  in  the  universe  from  which  the 
sample  is  selected  will  not  be  likely  to  fall  outside  the  limits 

p±3(0-6745)VtoM), 

and,  if  that  universe  contains  altogether  N  items,  the  number  of 
successes  will  not  be  likely  to  fall  outside  the  limits  j 

Njp±3(0-6745)NV(i?^M).'  T: 

In  Reading  the  total  number  of  all  inhabited  houses  in  th^ 
borough  was  18,000  at  the  time  of  the  inquiry,  i.e.  N=  18,000. 
The  total  number  of  houses  visited  was  840,  i.e.  71—840.  If  we^ 
call  a  house  assessed  at  £8  or  less  a  '  success,'  the  number  of  suchi 
houses  found  in  the  sample  was  206.  { 

Thus  ;p=206/840,        ^=634/840,  ; 

and  the  number  of  houses  rented  at  £8  or  less  in  the  whole  borough 
should  be  ^  i 

N2?  with  a  ly.e.=0•Q14:5N^/  (pq/n) 
i.e.  4414±180.  : 

The  actual  number  of  houses  so  rented  was  known  from  other  sources 
to  be  4380,  weU  within  the  limits  forecasted. 

The  value  used  for  p  in  the  above  is  that  given  by  the  sampleJ 
but  when  we  know  the  actual  number  of  successes  in  the  universe! 
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as  a  whole,  as  in  this  case  we  do,  we  might  use  the  true  value  of 
p,  i.e.  the  value  for  the  universe  in  place  of  that  for  the  sample. 
The  argument  might  also  be  put  in  another  way  without  affecting 
the  principle  employed,  thus  : — 

The  number  of  houses  rented  at  £8  or  less  in  the  whole  borough 
was  4380. 

But  the  proportion  of  houses  sampled  in  the  whole  borough  was 
840/18000,  i.e.  1/21-43. 

Hence  the  number  of  houses  at  the  above  rental  to  be  expected 
in  the  sample=4380/21-43=204. 

The  number  actually  found  in  the  sample  was  206,  with  a  probable 

^^^^^  =0-6745  V(^M) 

=0-6745  V(840  X  ^\%  X  il^SS) 
=  8,  approximately. 

AgaiQ,  the  number  of  persons  engaged  in  a  certain  occupation  at 
Reading  was  known  to  be  761  in  the  borough  as  a  whole.  Hence 
the  number  of  persons  so  engaged  to  be  expected  in  the  sample 
was  761/21-43,  i.e.  35. 

The  number  actually  found  in  the  sample  was  29  with  a  probable 

^^^^  =  0-6745  V(wi55) 

=0-6745  V(840  X  tIwt^  X  kl^^l) 
=4,  approximately. 

Further  examples  of  the  method  are  here  given,  in  each  of  which 
the  total  number  of  events  is  small  so  that  the  number  in  each 
sample  is  also  small,  and  since,  as  we  have  seen,  the  accuracy  or 
precision  of  the  proportion  of  successes  discovered  in  any  sample 
varies  directly  as  the  square  root  of  the  number  of  events  the  sample 
contains,  the  results  cannot  be  expected  to  be  so  good  when  this 
number  is  small. 

Example  (8). — 514  candidates  sat  a  certain  examination  paper  ; 
their  marks  ranged  from  3  to  64.  The  candidates  were  numbered 
consecutively  from  1  to  514,  and  a  random  sample  of  90  (17  J  per 
cent.)  was  selected  from  among  them  by  writing  down  the  90 
numbers  formed  by  the  digits  in  the  seventh  decimal  place,  taken 
in  groups  of  three,  in  the  logs  of  the  numbers  10104,  10204, 
10304,  .  .  .  ,  as  given  in  Chambers's  Tables,  neglecting  all  numbers 
greater  than  514  and  calling  such  numbers  as  005,  037,  etc. — 5, 
37,  etc.  In  this  way  each  of  the  numbers  between  1  and  514  stood 
an  equal  chance  of  inclusion. 
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The  distribution  of  candidates  in  the  sample  is  compared  with 
that  for  all  514  together  in  the  following  table  : — 


Percentage  of  All 

Percentage  of  Candidates  in 

No.  of  Marks  Obtained. 

Candidates  who  obtained 

Sample  who  obtained 

these  Marks. 

these  Marks. 

Less  than  15 

8 

p.e. 
8d=l-9 

15  but  less  than  25 

19 

17±2-6 

25     „            „      30 

16 

18±2-7 

30    „            „      35 

18 

13±2-4 

35     „            „      40 

15 

17±2-6 

40    „            „      50 

19 

18±2-7 

50  and  over. 

^o^ 

10±21 

^l) 

The  reader  might  verify  the  p.e.'s  given  in  the  last  column  : 
e.g.  proportion  in  the  sample  obtaining  less  than  15  marks— 7/90  ; 
therefore  ^j=-7/90,         g=83/90. 

Hence  the  S.D.  for  this  group 

=.V[7(1-9V)] 
=2-54, 

and  the  S.D.  for  the  percentage 

=VTrx2-54=2-8. 
Thus  the  p.e.  for  the  percentage 

=  |cr— 1-9,  approximately. 

Example  (9)  deals  in  a  similar  way  with  the  data  concerning 
infectious  diseases  in  241  towns  in  England  and  Wales  previously 
recorded  on  p.  62. 

A  sample  of  60  towns,  i.e.  about  25  per  cent.,  was  chosen  in  a 
random  fashion  as  in  the  last  example,  and  the  sample  distribution 
is  compared  below  with  that  of  the  241  towns  as  a  whole. 

The  verification  of  the  probable  errors  in  this  and  the  next  case 
is  left  to  the  reader. 


Case  Rate  per  1000 
of  the  Population. 

Actual  No.  of 
Towns  so  rated. 

No.  as  suggested  by 
the  Sample. 

1  and  under  5 
5        „           9 
9        „          13 
13  and  over. 

85 
86 
42 
28 

p.e. 
92  ±10 
96±10 

28±  7 
24±  6 
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Example  (10)  is  concerned  with  the  annual  output  per  head  in 
142  different  types  of  employment  as  given  in  1907  by  the  Censiis 
of  Production  [data  from  Sixteenth  Abstract  of  Labour  Statistics  of 
the  United  Kingdom,  Cd.  7131].  The  distribution  suggested  by  a 
random  sample  of  50  different  occupations  is  compared  with  that  of 
the  complete  list  of  142  occupations. 


No.  of  Occupations 

No.  in  Complete 

Actual  No. 

Output  per  head. 

in  Sample  with 

List  as  deduced 

found  in 

this  Output. 

from  Sample. 

Complete  List. 

Under  £60       .. 

4 

p.e. 
ll±3-6 

12 

£60    and  under  £80 

16 

45  ±6-2 

42 

£80             „        £100 

6 

17d=4-3 

25 

£100           „        £120 

10 

28±5-3 

20 

£120           „        £190 

8 

23±4-9 

27 

£190  and  over 

6 

17±4-3 

16 

The  S.D.  in  each  of  the  last  three  examples  has  been  calculated 
by  using  the  value  for  p  given  by  the  sample,  which  is  the  value 
one  must  fall  back  upon  in  practice  when  the  true  p  for  the  whole 
distribution  is  unknown.  In  any  case  where  we  are  able  to  test 
our  sample  by  comparison  with  the  whole  distribution,  however, 
it  is  possible  to  use  the  true  value  of  p,  e.g.  in  Example  (10) 
output  £100-120,  p==20/142  as  opposed  to  10/50. 
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CHAPTER   XV 

CURVE   FITTES^G PEARSON' S    GENERALIZED 

PROBABILITY    CURVE 

It  may  be  recalled  that  in  the  introductory  chapter  an  outline  was 
given  of  the  manner  in  which  the  theory  of  Statistics  might  be 
conceived  to  develop.  It  was  shown  how  the  desire  for  simpUfica- 
tion  and  the  need  for  compression  leads  to  the  division  of  a  large 
mass  of  figures  dealing  with  any  given  matter  into  groups  ;  indeed, 
it  may  well  be  that  the  statistics  have  been  so  arranged  at  the 
source  in  the  act  of  collecting  :  e.g.  we  may  have  to  deal  with 
so  many  males  of  height  54  in.  and  less  than  55  in.,  so  many  of 
height  55  in.  and  less  than  56  in.,  so  many  of  height  56  in.  and  less 
than  57  in.,  and  so  on.  Here  corresponding  to  each  given  height, 
which  we  may  label  x,  or  each  range  of  height,  such  as  x^  to  x^, 
we  have  a  certain  frequency  of  males  of  that  height  or  range, 
which  frequency  we  may  label  y,  and  hence  a  frequency  table  can 
be  formed  showing  the  variation  of  y  with  x.  Further  we  have 
seen  how  such  pairs  of  corresponding  values  of  x  and  y  can  be 
plotted  so  as  to  picture  the  complete  observed  frequency  distribution 
to  the  eye. 

Now  the  representation  thus  made,  though  helpful  up  to  a  point, 
is  not  entirely  satisfactory.  Whether  we  simply  join  up  successive 
points  (Xy  y),  or  set  up  rectangles  of  varying  height  y  on  bases 
spanning  the  successive  ranges  of  x,  or  erect  ordinates  (y's)  at  the 
mid-points  of  these  bases,  joining  the  summits  in  the  manner 
previously  described,  the  connection  so  established  between  each 
observation  and  the  next  is  too  superficial,  depending  merely  on 
the  fact  of  casual  neighbourship,  and  may  sometimes  give  a  false 
impression  of  frequency  and  changes  in  frequency  in  the  population 
of  which  the  observations  are  but  a  sample.  And  this  is  neces- 
sarily so  if  we  confine  ourselves  strictly  to  the  data  observed. 

One  difiiculty  which  has  to  be  faced  is  that  only  within  certain 
broad  limits  can  we  trust  our  observations  to  give  us  information 
which  is  truly  representative  of  the  population  in  which  we  are 
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interested.  We  seldom  if  ever  deal  with  the  whole  population  : 
in  fact  it  may  be  so  large  that  it  is  impracticable  even  to  reckon  it ; 
instead  we  make  a  random  or  unbiassed  selection  of  a  smaller  but 
adequate  number  of  individuals  belonging  to  the  population,  and 
classify  them  according  to  the  size  or  nature  of  the  character  which 
concerns  us.  But,  granted  that  our  sample  is  adequate  in  size 
and  unbiassed,  the  numbers  obtained  in  the  different  groups-  of  the 
frequency  distribution  will  still  be  subject  to  the  errors  of  random 
sampling,  and  it  is  only  after  these  errors  have  been  calculated  that 
we  can  lay  down  the  probable  Umits  within  which  our  sample  may 
be  regarded  as  really  representative  of  the  population  as  a  whole. 

Another  difficulty  arises  owing  to  the  fact  that  our  observations 
in  general  do  not  cover  the  whole  field  of  values  of  the  variables  x 
and  y  ;  we  may  quite  likely  want  to  know  the  percentage  frequency, 
2/,  of  individuals  with  a  character  (height  or  whatever  it  may  be)  x 
which  does  not  chance  to  be  any  one  of  the  a;'s  observed,  if  the 
observations  are  only  recorded  according  to  discrete  (separately 
distinct,  Hke  5  ft.,  6  ft.,  7  ft.)  values  of  x ;  on  the  other  hand,  if 
the  observations  have  been  classed  in  groups,  the  frequency  in 
which  we  are  interested  may  refer  to  an  x  which  does  not  coincide 
with  the  centre  of  any  group  or  which  is  even  outside  the  range 
altogether.  We  have  therefore  further  to  inquire  whether  such 
information  can  be  deduced  in  any  way  from  the  statistics  collected. 
Now  it  so  happens  that  both  these  difficulties  disappear  if  we 
can  only  attain  the  ideal  already  outlined  in  discussing  graphs, 
and  find  a  suitable  curve  to  '  fit '  the  statistics  observed.  Such  a 
curve  would  not  necessarily  pass  through  all  or  any  of  the  points 
(ic,  y)  representing  the  observations,  for  these,  as  we  have  remarked, 
are  subject  to  errors  of  random  sampHng  and  the  observed  frequency 
y  of  any  ic  may  be  greater  or  less  than  the  corresponding  y  in  the 
population  at  large  to  which  the  curve  is  presumed  to  approximate. 
The  curve  in  short  must  remove  the  roughnesses  which  are  in- 
separable from  ordinary  observation.  Moreover,  given  any  x,  not 
merely  one  of  the  x's  observed,  it  must  be  possible  to  read  off  from 
it  the  corresponding  y,  the  frequency  appropriate  to  that  x. 

It  is  not  always  accurate  enough  for  our  purpose  to  draw  a  curve 
by  eye,  passing  as  evenly  as  possible  through  the  middle  of  the 
points  observed  in  the  manner  conceived  in  an  earHer  chapter.  It 
is  necessary  in  some  way  to  find  an  algebraical  formula,  possibly 
even  a  trigonometrical,  exponential,  or  more  complex  expression, 
which  will  give  the  y  corresponding  to  any  x  desired.  This  formula 
or  equation  must  depend  upon  the  statistics  collected :    i.e.  the 
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constants  involved  in  it  must  be  directly  and  fairly  easily  computed 
from  the  2/'s  observed,  and  the  results  of  all  the  observations  should 
enter  into  the  equations  which  determine  the  constants  in  order  to 
make  use  of  the  full  information  at  our  disposal.  In  addition,  the 
method  of  determining  the  equation  and  its  constants  should  be  as 
general  as  possible,  so  relieving  us  of  the  trouble  of  discovering  a 
new  method  owing  to  the  failure  of  the  original  one  at  nearly  every 
trial.  Finally,  the  equation  should  not  be  so  intricate  as  to  make 
the  labour  of  calculating  y  for  any  given  x  too  heavy  to  be  attempted 
with  the  ordinary  equipment  at  the  statistician's  disposal.  Once 
such  an  equation  is  found  it  is  a  fairly  straightforward  proceeding 
to  trace  the  curve  for  which  it  stands,  and  it  wHl  remain  afterwards 
to  test  the  goodness  of  fit  in  some  more  refined  way  than  by  seeing 
how  closely  it  passes  through  the  observed  points  by  eye. 

When  we  come  to  review  the  shapes  of  the  frequency  polygons 
or  histograms   most  commonly  met,   we  find  that  the   majority 

of  them  start  from  low  fre- 
quency, rise  to  a  maximum 
as  X,  the  character  observed, 
increases,  then  fall  again  to- 
wards zero  very  likely  at  a 
different  rate.  In  fact  the 
statistics  suggest  a  shape  something  like  that  shown  in  fig.  (27) 
for  the  corresponding  frequency  curve,  though  we  cannot  be  sure 
that  it  would  coincide  with  the  axis  at  either  extremity.  [Cases 
do  occur  where  the  curve  has  two  or  even  more  humps  (maxima), 
but  we  purposely  restrict  ourselves  to  the  simpler  and  more  frequent 
tyipe  described.] 

Now  the  simplest  shape  to  deal  with  from  the  algebraical  point 
of  view  would  certainly  be  symmetrical  in  character,  corresponding 
to  statistics  which  rise  and  fall  at  the  same  rate,  though  this  would 
not  necessarily  be  the  most  common  shape  among  the  records  of 
actual  Hfe.  In  order  to  simplify  our  problem,  therefore,  we  might 
start  by  making  up  for  ourselves  an  ideally  simple  set  of  statistics 
which  are  perfectly  symmetrical,  and  see  whether  we  can  discover 
a  process  for  fitting  a  curve  in  a  case  of  that  kind.  If  this  prove 
successful  it  might  be  possible  afterwards  to  adapt  the  same  process 
to  an  unsymmetrical  or  '  skew  '  set  of  statistics  made  up  in  a  similar 
way.  Then  finally  we  should  inquire  whether  actual  observations 
conform  to  any  of  the  types  of  curve  discovered,  and,  if  so,  how 
they  can  be  fitted  together. 

Now  in  manufacturing  our  statistics  we  must  keep  before  us  the 
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object  at  which  we  are  aiming.  Given  the  statistics,  what  we 
want  is  a  formula,  algebraical  or  of  some  other  kind,  to  fit  them. 
This  raises  the  possibility  of  choosing  the  statistics  themselves  in 
some  algebraical  form,  and  such  a  form  is  at  hand  in  the  binomial 
expansion,  which  is,  in  fact,  one  of  the  first  examples  of  a  general 
symmetrical  expression  one  meets.     Thus 

(a+6)i=a+6 
(a+6)2=a2+2a6+62 

(a+6)4=a4+4a36+6a262+4a63-fM. 
(a+6)5=a5+5a*6+  I0a^b^+  l0a^b^+5ab*-\-b^ 


1-2 

Clearly  all  these  expressions  become  perfectly  symmetrical  if  we 
put  a=6,  for  they  read  the  same  whether  we  run  from  left  to  right 
or  from  right  to  left. 

We  have  already  seen  what  an  important  part  the  binomial 
expansion  plays  in  the  early  stages  of  the  theory  of  probability  : 
e.g.  (i+i)^^,  when  expanded,  tells  us  at  once  the  proportion  of  times 
on  the  average  we  may  expect  10  heads,  9  heads  and  1  tail,  8  heads 
and  2  tails,  and  so  on,  when  we  toss  an  evenly-balanced  coin  ten 
times  in  succession  ;"  or  again,  if  p  is  the  probability  that  a  certain 
event  will  happen,  and  q  the  probability  that  it  will  fail  to  happen 
at  one  trial,  then  the  probabilities  that  it  will  happen  p  times, 
ip—l)  times,  {p—2)  times,  .  .  .inn  trials  are  given  by  the  succes- 
sive terms  in  the  expansion  of  {p-\-Q)^-  However,  we  make  no 
assumption  for  the  moment  as  to  the  values  of  a  and  6,  except 
that  in  the  symmetrical  case  with  which  we  begin  they  are  equal, 
and  we  have  as  the  successive  terms  of  {a-\-a)^  : — 

an,  ^a^  ^(^-^V,  .  .  .  ,  ^<^-^U  ruin,  an,         - 
1-2  12 

Let  us  suppose  that  our  observed  statistics  take  the  above  form 
so  that  these  terms  may  be  plotted  as  a  succession  of  ordinates, 
2/i>  2/2'  2/3»  •  •  •  .  Vn+v  associated  with  abscissae,  x^,  x^y  Xq,  .  .  .  ,  x^+i, 
at  equal  distances  apart  measured,  say,  by  c ;  for  convenience  we 
may  place  the  origin  as  in  fig.  (28),  so  that 

:2c,  x^=Sc,  .  .  .  ,  XJ^+^=(n-\-l)c, 
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and  we  can  then  form  a  frequency  polygon,  where 


x^=rc,  y^- 


n(n—\)(n—2) 


(n-r+\. 


1-2-3  .  .  .  (r-1) 

are  typical  values  of  a  pair  of  the  variables  x  and  y,  each  such 
pair  defining  a  vertex  of  the  polygon. 

Now  in  this  case,  since  the  statistics  have  been  artificially  built 
up  by  ourselves  and  are  not  in  reaHty  a  random  selection,  they  are 
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Fig.  (28). 

not  subject  to  errors  of  samphng  and  the  fitting  curve  should, 
therefore,  pass  through  the  summits  of  all  the  2/'s,  or,  perhaps 
better,  touch  each  of  the  fines  joining  adjacent  summits.  The 
curve  only  differs  from  the  neighbouring  outline  of  the  polygon  in 
that  the  latter  is  discontinuous,  it  alters  its  direction  relative  to  the 
axis  of  X  by  jerks  at  equal  intervals  c  measured  along  OX,  whereas 
the  former  must  rise  gradually  and  continuously  and  then  fall  in 
the  same  way.  This  is  one  sense  in  which  we  mean  that  the  fitting 
curve  removes  the  roughness  of  the  observation  statistics — ^it  gets 
rid  of  jerks  besides  fiUing  gaps  in  the  observations. 

It  will  be  clear  that  as  n  increases  and  c  diminishes  (and  this  is 
what  we  aim  at  in  collecting  statistics,  though  it  has  not  been  assumed 
in  what  immediately  follows)  the  discontinuity  in  the  polygon 
becomes  less  and  less  pronounced  and  the  outline  of  the  figure 
approximates  more  and  more  closely  to  the 
curve.  Moreover  this  approximation  gains  in 
intensity  if  we  make  the  slope  of  the  curve  at 
each  appropriate  point  the  same  as  the  slope 
obtained  by  joining  up  the  summits  of  adjacent 
ordinates  of  the  polygon. 


^^•^r+r-^J 


yr%i 


Now  the  expression 


{yr+i-yr)lG 
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is  the  measure  of  the  gradient  from  the  rth  ordinate  to  the  (r+l)th, 
and 

yr+i-yr_«"rM^-l)  •  •  ♦  {n-r+l)_n{n-l)  .  .  .  (n-r+2y\ 
c  c\_         1-2  ...  r  1-2  ..  .  (r-1)       J 

_a"  n(n—l)  .  .  .  {n—r-{-2)rn—r-^l       1 
~~c  1-2  ..  .  (r-1)        L       r       ~  J 

n—2r+l 

=yr . 

re 

If  this  be  also  taken  as  the  gradient  of  the  tangent  to  the  curve  at 
the  point  midway  between  (a;^,  y^)  and  (aj^+i,  yr+i),  calling  this  point 
(a;,  y)  we  have,  since,  in  the  notation  of  the  differential  calculus, 

—  is  the  measure  of  the  gradient  of  the  curve  at  this  point, 
dx 

dy^yr+i-Vr 
dx  c 

n-2r+l 

==yr 

re 
And 

x=i{Xr+x,^i)=Krc+{r+l)c]=l{2r+l) 

2  1-2  ..  .  (r— 1)        L       ^  J     2r 

Hence 

^^n-2r+l_  2ry  ^  (n+2)-(2r+l)_     2y     L^2-^'' 
re  n-\-l  re  {n-\-l)c\  c 


%_     22/     /„^2_2. 


Thus 


dx     {n-\-l)c\  c 

But  if  we  had  started  with  any  other  two  adjacent  ordinates 
instead  of  i/r  and  y^+i  we  should  have  been  led  to  exactly  the  same 
relation  connecting  the  corresponding  x  and  y  of  the  required 
curve,  for  r,  which  serves  to  particularize  the  ordinates,  does  not 
appear  in  the  relation  at  all — their  individuality  has  been  eliminated. 
The  above  equation  may  thus,  if  we  please,  be  taken  as  holding 
good  for,  and  therefore  defining,  all  points  {x,  y)  of  the  fitting  curve  : 
it  is,  in  short,  the  differential  equation  of  that  curve. 

The   equation   may  be   slightly  simplified   by  transferring  the 

origin  to  the  point     {n-{-2)-  ,  0   ,  evidently  the  point  O'  in  fig.  (28) 
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corresponding  to  the  maximum  ordinate  of  the  polygon  or  curve. 
Algebraically,    this    merely    means    that    for    x    we    must    write 

x-\-- L    in  the  equation,  which  then  becomes 

dy_     2y     [     2a;\_         4:xy 


dx     (n-\-l)c\      cj         (n+l)c2 

We  may  pass  to  the  equation  proper  of  the  curve  by  integration. 
Thus,  separating  the  variables, 


.    J  y     {n-\-l)cV 


2x^ 
Therefore,  log  y-\- +  A=0, 

where  A  is  a  constant. 
Hence  '  2/=2/o6''''^''^"+'^ 

where  y^  is  a  new  constant. 
This  may  be  written 

y^Yoe-^'/''^',     .         .         .     (1) 

where  a'^=(n-\-\)c^l4:,  and  it  is  called  the  probability  curve  or  normal 
curve  of  error.* 

Let  us  now  see  whether  the  procedure  so  far  followed  is  applicable 
in  the  case  of  an  unsymmetrical  or  skew  distribution  of  statistics. 
With  this  object  we  will  suppose  the  frequencies  of  observations  in 
successive  groups  to  be  represented  by  the  corresponding  terms  in 
the  expansion 

vyivh 1  \ 

and  as  before  we  can  form  a  frequency  polygon  by  joining  the 
summits  of  the  ordinates 

n(n—l)  ^  „  o 

[*  Karl  Pearson's  method  of  getting  the  normal  curve  equation  has  been 
adopted  as  the  basis  of  the  above  discussion,  in  preference  to  that  usually 
followed,  which  develops  the  curve  also  from  the  binomial  expression  but  some- 
what on  the  lines  of  Laplace  and  Poisson.  They  showed  that  the  sum  of  all  the 
terms  lying  within  a  range  t  on  either  side  of  the  maximum  term  in  the  expan- 
sion of  {p  +  g)"  is  approximately 


V2ir-(T[_J_t 


where  (r=  ij{npq),  whence  the  equation  of  the  curve  is  derived.     (See  Historical 
Note  at  the  end  of  Chapter  xviii. )] 
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erected  on  the  axis  of  x  at  distances  from  the  origin  given  by 
^i=c,  a?2=^^c,  x^=^6c^  .  .  .  ,  x^j^y=\n-\-\)c, 

the  figure  being  very  similar  to  that  in  the  symmetrical  case. 

The  gradient  of  the  fitting  curve  where  it  touches  the  join  of 
{x^,  2/r)  to  (a;^+i,  i/r+i)  is  given  by 

dy_yr+i-yT^ 

dx  c 

and  we  must  try  and  express  the  right-hand  side  as  before  in 
terms  of  {x,  y),  the  co-ordinates  of  the  mid-point  of  the  hne  joining 
(^r>  2/r)  to  (a:^+i,  i/r+l)- 

We  have 


dy 
dx 


_irr.(^.-l). . .  {n-r-\-\)  _n(n-l) .  .  .  {n-r+2^  1 

cL  1-2  ...  r       ^      ^  1-2  ..  .  (r-1)     ^         ^     J 

-1)  .  .  .  (7i-r+2)p 
1-2  ..  .  {r-l)        L 


j3"-y-^    n{n-l)  .  .  .  {n-r+2)\'n-r+l  _  1 


c 
Also 

2x=Xj.-\-x^+-^=rc+{r-\-l)c={2r-{-l)c 


1-2  ..  .  (r-l)  .  L       ^  J 


Thus 


2y/n-r+l         \  //n-r+l    ,     \ 


dy_2y/n—r+l 
dx 


2v 
=J-[{n^\)q-r(p-\-q)\l[(n+\)ci+r{p-q)\ 
c 

2?/ 
=A^(n+l)qc-{jp+q)(2x-c)]l[2(n+\)qc-^{p-q)(2x~c)\, 
c 

This,  being  true  for  all  such  pairs  of  values  of  x  and  2/,  is  now  in  a 
form  independent  of  any  particular  point  on  the  curve  we  seek ; 
in  other  words,  it  may  be  taken  as  the  differential  equation  of  the 
curve,  and  it  is  evidently  of  the  type 

dx       iP+yx) 

where  a,  jS,  y  involve  only  p,  q,  w,  etc.,  the  constants  of  the  distri- 
bution we  set  out  to  fit. 
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The  equation  is  simplified  if  we  transfer  the  origin  to  the  point 
(a,  0),  when  it  becomes 

dx    yx-\-h 
where  8=jS+ya. 

To  integrate,  separate  the  variables  as  before  : 

'dy     [     X 


/-+/■ 


-dx=0. 


y     Jyx-\-h 

Therefore,  log  y+  ^  /•(y^+3)-3^^^Q 

yj       yx-\-h 

X     8 

log  2/-h-— -  log  (ya;+S)+A=0, 

y    7 
where  A  is  a  constant, 

or  y=Ee-^''^(yx+hfly'', 

where  B  is  a  constant. 
It  may  be  written 


y=y.«-"(i+a)''   •     •     • 


(2) 


where  k=^l/y;a=S/y,  and  2/0  is  a  new  constant. 

This,  then,  may  prove  a  suitable  type  of  curve  to  fit  a  set  of 
statistics  forming  a  skew  frequency  distribution,  but  the  question 
now  arises  whether  equations  (1)  and  (2)  are  the  most  general 
types  possible.  Clearly  (1)  is  only  a  particular  case  of  (2)  obtained 
by  making  p=q,  and,  this  being  so,  (2)  may  itself  be  a  particular 
case  of  some  still  more  general  type. 

Light  may  be  thrown  on  this  if  we  consider  the  geometrical 
bearing  of  the  differential  equation  obtained  in  the  last  case  : 

dy    y{a—x) 


dx     P-\-yx 


(3) 


The  presence  of  y  and  (a—x)  in  the  numerator  of  the  right-hand 

dij 
side  of  (3)  shows  that  —  vanishes  when  y=0  and  when  x=a,  i.e.  the 
dx 

curve  touches  the  axis  of  x  where  the  two  meet  and  there  is  a 

maximum  point  on  the  curve  at  x=a.     (Since  a  is  the  particular 

value  of  the  organ  or  character  x  for  which  the  frequency  is  a 

maximum,  a  is  of  course  the  mode.)     Now  these  two  characteristics 

are  the  very  ones  to  which  we  wished  to  give  symboUcal  expression 

since  they  serve  to  describe  in  broad  outline  what  was  agreed  to 
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be  the  trend  of  the  majority  of  frequency  distributions — the  rise 
from  zero  to  a  maximum,  at  first  gradually,  then  faster,  and,  after 
passing  through  the  maximum,  the  fall  to  zero  again,  generally  at 
a  different  rate. 

As  to  the  denominator  of  equation  (3),  the  corresponding  equation 
for  type  (1), before  the  origin  was  changed,  was  similar  to  equation  (3), 
except  that  it  contained  no  x  term  in  the  denominator,  and  that  is 
readily  understood  when  we  note  that  y  is  a  multiple  of  {p—q) 
and  thus  vanishes  when  p=q.  Now,  if  from  (3)  we  get  a  less 
general  tjrpe  of  curve  by  dropping  the  x  term  in  the  denominator, 
we  may  perhaps  get  a  more  general  type  by  adding  an  x'^  term,  and 
even  an  x^  term,  an  x^  term,  and  so  on.  In  fact  there  seems  no 
reason  why  the  denominator  should  not  be  any  function  of  x,  say 
f{x),  which,  however,  we  shall  suppose  for  simplicity  capable  of 
expansion  in  a  Maclaurin's  series  of  ascending  powers  of  x  which 
converges  quickly. 

We  are  led  to  propose,  therefore,  as  more  general  than  (3),  the 
differential  equation 

dy       y(x-\-b) 


dx    px^-\-qx-{-r 


(4) 


We  stop  at  a;2  in  the  denominator  because  it  has  been  found,  if  we 
may  anticipate  results  to  save  needless  labour,  that  beyond  this 
point  the  heaviness  of  the  calculation  involved  and  the  decreasing 
accuracy  of  the  higher  moments  that  have  to  be  introduced  out- 
weigh any  other  advantage  gained.  The  curve  or  set  of  curves 
resulting  from  the  integration  of  equation  (4)  is  known  as  Karl 
Pearson's  Generalized  Probability  Curve,  and  their  author  has 
stated  that,  while  it  comprises  the  two  other  types  as  special  cases, 
it  practically  covers  all  homogeneous  statistics  he  has  had  to  deal 
with. 

Just  as  the  differential  equations  in  the  first  two  cases  considered 
were  related  respectively  to  the  symmetrical  and  the  skew  binomial 
expansions,  so  is  equation  (4)  related  to  the  hypergeometrical 
expansion 

the  successive  terms  of  which  express  the  probabiUty  that  r  black 
balls,  (r— 1)  black  balls  and  1  white  ball,  (r— 2)  black  balls  and 
2  white  balls,  .  .  .,  r  white  balls,  will  be  drawn  from  a  bag  contain- 
ing pn  black  balls  and  qn  white  ones,  where  ip-\-q)  =  l,  when  r  balls 
are  drawn  in  all,  each  being  replaced  before  the  next  is  drawn. 
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If  the  terms  of  this  expansion  are  represented  by  ordinates  of 
which  the  summits  determine  a  polygon  as  in  the  binomial  cases, 
the  corresponding  expression  for  the  gradient  of  the  curve  at  any 
point  is  given  by  an  equation  of  type  (4).  We  need  not  go  over 
the  detailed  proof  of  this  statement  since  it  follows  precisely  the 
same  lines  as  in  the  previous  cases. 

The  method  of  integration  of  the  equation 

dy_    y(x-\-b) 
dx  '  px^-j-qx-j-r 

depends   upon  the  nature  of  the  roots  of  the  quadratic  in  the 
denominator  which  may  be  written 


x+, 


px^-\-qx-]-r=p\  ^     . 

4 


W  pJJ 


^+^  - 


2p 


4^2 


4:pr 


4pr      /J 


x-h— 


4r2  , 
—  — -/cU- 
g2 


.,], 


where  k  =q^l4:pr,  and  it  is  evident  that  the  quadratic  splits  up  into 
real  factors  if  k{k—1)  is  positive.     This  is  the  case  when  k  has  any 

negative  value,  or  when  it  is  positive 
and  greater  than  1,  the  truth  of  which 
may  be  seen  more  effectively  if  the 
curve 

2/=/c(/c— 1), 

K  +  (>\) 

a  parabola  symmetrical  about  the  line 
K=i,  be  drawn,  fig  (29),  by  plotting 
y  against  k. 
Further,  the  product  of  the  roots  of  the  quadratic 

px'^-^qx-\-r=0 
4:pr 


Fig.  (29). 


IS 


4 


p     q"     ^pr     g^ 

so  that  the  roots  when  real  will  be  of  the  same  sign  if  k  is  positive 
and  of  opposite  signs  if  /c  is  negative.     The  boundary  lines 

K=0  and  /c=l 

thus  divide  the  whole  field  into  three  parts,  as  shown  in  fig.  (30),  in 
one  of  which  the  roots  are  real  and  of  opposite  sign,  in  the  next 
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the  roots  are  imaginary,  and  in  the  third  the  roots  are  real  and  of 
the  same  sign.  At  the  boundaries  we  get  particular  cases  as 
follows  : — 

K=0 :  this  requires  q=0,  since  K=q^/4:pr,  which  makes  the 
roots  of  the  quadratic  equal  but  of  opposite  sign,  unless  p=0  also, 
and  in  that  case  both  roots  are 
infinite  ; 

K=l  :  the  roots  are  real  and  equal 
and  of  the  same  sign  ; 

K=cc:  this  requires  p  =0  or  r=0; 
in  the  former  case  one  root  of  the 
quadratic  is  infinite,  and  in  the 
latter  one  root  is  zero. 

Thus,  returning  to  the  differential 
equation,  the  curves  which  result 
from  the  integration 

'dy     f  (x-]-b)dx 


y 
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•fe- 
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Fig.  (30). 


J  y    J: 


px^-{-qx-\-r 

are  of  different  types  according  to  the  value  of  k,  which  is  therefore 
called  the  criterion. 

Type  I. — /c—^^.     Roots  of  px^-\-qx-\-r=0  real  and  of  opposite  sign. 
In  this  case  we  may  write 


and  so  get 


px'^-^qx-{-r=p{x-{-a)(x—P') 
{x-]-b)dx 


J  y     J  via 


=0, 


p{a'+x){p'-x) 
or,  transferring  the  origin  to  the  point  (—6,  0),  the  mode,  we  have 

'dy  .    f  xdx 


or 


j  y^]p^a'-b+x){p'-^b-x) 

[dy     [         xdx 

J  y      J  p(a-\-x) 


=0, 


where 

Therefore, 

where  A  is  a  constant. 


a=a'-6,  j8=iS'+6. 


,  Ifa       dx       I  f    B       dx    ^    .      ^ 

log  y—- -+-    -J- -+A=0, 

^^     pJa-\-xa+P    pJp-xa+p 
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Thus  log  y=       ^      [a  log  (a+a;)+^  log  (iS-a;)]+log  B, 

where  B  is  a  constant, 


whence  y='B(a-\-x)v^'>-+P')[Q—xY^'^'^^^ 

where  v=l/p{a-^P)  and  i/o  is  a  new  constant. 

This  is  a  skew  curve  of  limited  range,  bounded  by  the  lines  x=—a 
and  a;=+iS,  with  the  mode  at  the  origin. 

Type  II. — K=0.  q=0,  but  not  p=0.  Roots  of  px'^-{-qx-^r=0 
equal  and  of  opposite  sign. 

This  curve  is  just  a  particular  case  of  type  I.,  which  reduces  to 

y=y.(i-„-.)  ,    .      .      •    (6) 

symmetrical  about  the  axis  of  y  (because  for  any  value  of  y  there 
are  two  values  of  x,  equal  and  of  opposite  sign)  and  of  limited 
range  bounded  by  a;=— a  and  x=-^a,  with  the  mode  at  the  origin. 

Type  III. — K  =  oz.*  p=0,  but  notr  —  0.  One  root  ofpx'^-\-qx-{-r=0 
infinite. 

This  is  the  skew  binomial  case  over  again.  It  may  be  also  de- 
duced from  type  I.  by  making  one  root,  say  ^' ,  tend  to  infinity. 
The  curve  then  takes  the  form 

because  j8=jS'+6,  so  that  j8  tends  to  infinity  with  ^'.     Hence 

where  A=— jS/ic. 

1+^)   e      ,     .         .         .     (7) 

a  skew  curve  limited  in  one  direction  by  the  Hne  x=—a,  with  the 
mode  at  the  origin. 

[*  Although  theoretically  this  type  corresponds  to  an  infinite  value  for  /c,  in 
practice  it  will  as  a  rule  give  a  reasonable  fit  provided  k  is  numerically  greater 
than  4.     (See  W.  P.  Elderton's  Frequency  Curves  and  CorrdcUion,  p.  50)]. 
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Type  IV. — /<:+^^  and  <1.     Roots  of  px^-]-qx+r=0  imaginary. 
Put  k{k—1)  =—X^,  and  the  differential  equation  then  leads  to 

:-{-b)dx 


2p)  "^  q^  ^  J 
Transfer  the  origin  to  the  point  (  — -±-,  0 

2pl 


1.      f  ^■■r^\Jb_q_\_g_     ^_^^ 


log  y=A+-  log    k2+4 —  +  — _^  J^  tan-i       , 
^2p     ^\    ^    q^  y\p    2p^j2rX  2rA 


where  A  is  a  constant. 


-1 

p2^ 


Therefore,  y=yjl+^J     e"^*'"    '*     .  .  .     (8) 

where  a= — ,  m=— — ,  v  =— —  b—-±- 

q  2p  ap\      2pj 

and  2/o  is  a  constant. 

This  is  a  skew  curve  of  unlimited  range  in  both  directions.     The 

position  of  the  mode  is  found  by  putting  —  =0  in  (8)  after  differ- 

dx 

entiation,  or,  what  comes  to  the  same  thing,  is  seen  by  direct  refer- 
ence to  the  differential  equation  itself.  Thus  the  distance  of  the 
mode  from  the  origin 

=  —  ib—^\=vpa 


2p 

■—va/2m. 


Type  V. — K  =  l.     Roots  of  px^-\-qx-]-r=0  real  and 
The  equation  to  integrate  becomes 

"dy     f  {x-\-b)dx 


/?=/■ 


'  '''h: 
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Transfer  the  origin  to  the  point  (  —  _,  0  ),  and  this  becomes 


J  y    J      'px^ 


dx 


log  2/=A+-  log  x—  .  (  6-^  )-,  '\ 

p              p    \      2pjx  ; 

where  A  is  a  constant.  i 

-l(b-l)l  \ 
Therefore,            y=yQX^'Pe  ^^    ^p'^~ 

y=yoX-e-7/x,     ...     (9)  I 

where  s  =  —  l/p,  y=-i  b——  ),  and  2/0  is  a  constant. 

P\      2pl 

Here  x  cannot  become  negative,  so  that  the  curve  is  skew  and  \ 

limited  in  one  direction.     The  distance  of  the  mode  from  the  origin  j 

Type  VI. — K-\-^^  and  >\.     Roots  of  px^-\-qx-\-r=0  real  and  of  the  i 

same  sign.  i 

Equation  becomes  ] 

rdy^  r    {x-^h)dx  j 

J  y     J  p(x-\-a)(x^P)  i 

iog,^fr±i^.-i-+i^.-ij..  ' 

]\jp{^-a)    x^ra    p{a-p)    x+^j  i 

=A+_J—  [(6-a)  log  {x-\-a)-(b-P) log (a^+jS)],  i 
p(P-a) 

where  A  is  a  constant ;  ♦• 

or,  transferring  the  origin  to  (— j8,  0),  j 

y=yo\x-(p-a)f^^h^(^                             .  \ 

y=yo(x-a)^^-'^^     .         .         .     (10)  I 

where  a=jS— a,  q2  =  {b—a)/p(P—a),  qi  =  (b—^)/p{P—a),  and  yo  iB  a,  : 

constant.  j 

This  is  a  skew  curve  bounded  by  x=a  in  one  direction.-    The  j 

distance  of  the  mode  from  the  ongm=— {b—^)=aqj{qi—q2).  j 
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Type  VII. — /c=0,  ^=0,^=0.  Boots  of  the  quadratic  px^-\-qx-]-r=0 
both  infinite. 

This  is  the  symmetrical  binomial  case  over  again  and  the  integra- 
tion reduces  to 


J  y~ 

'!'>■ 

or,  transferring  the  origin  to  (— 

■b,  0), 

fdy^ 
}  y~ 

=/;- 

. 

\ogy-- 

=A+,>. 

where  A  is 

a  constant. 

Therefore 

y- 

=y,e-""'^ 

•     .  (11) 

where  i/o  is  a  constant  and  a^=z—r. 

This  curve,  the  normal  curve  of  error,  is  symmetrical  about  the 
axis  of  y,  where  mean  and  mode  coincide,  and  it  is  of  unlimited 
range  on  either  side  of  it. 


CHAPTER   XVI 

CURVE  FITTING  {continued) — the  method  of  moments 

FOR   CONNECTING    CURVE    AND    STATISTICS 

We  have  now  completed  the  first  stage  of  the  discussion  upon  which 
we  embarked  :  we  have  found  by  the  application  of  general  prin- 
ciples various  types  of  curve,  represented  by  different  equations, 
which  are  said  to  fit  more  or  less  satisfactorily  a  considerable  number 
at  all  events  of  frequency  distributions  composed  of  homogeneous 
material. 

Our  next  task  is  to  pass  from  the  general  to  the  particular,  to 
see  how  to  set  up  a  connection  between  an  actually  observed  fre- 
quency distribution  and  the  appropriate  theoretical  curve.  This 
again  seems  to  break  up  into  two  parts — (1)  to  find  a  way  of  deciding 
which  type  of  curve  to  adopt  in  a  particular  case  ;  (2)  to  determine 
the  constants  of  the  curve  in  terms  of  the  observed  statistics  ;  but 
since  the  criterion,  k,  which  distinguishes  one  type  of  curve  from 
another  is  itself  a  function  of  the  constants  of  the  curve  before 
integration,  it  follows  that  the  solution  of  the  first  part  is  incidental 
to  that  of  the  second. 

The  general  method  proposed  for  determination  of  the  constants 
of  the  curve  in  terms  of  the  observed  statistics  is  the  now  well-known 
method  of  moments  due  to  Karl  Pearson,  whereby  the  area  and 
moments  of  the  fitting  curve  are  equated  to  the  area  and  moments, 
calculated  from  the  statistics,  of  the  observation  curve. 

If  a  frequency  table  be  drawn  up  (see  Table  (40))  showing  the 
number  /  of  observations  corresponding,  to  the  deviation  x  of  each 
value,  or  group  mid- value,  X  of  the  character  observed  from  some 
fixed  value,  the  expression 

^1/1+^2/2+   •  •  •  +^rfr+   '  •  • 
is  called  the  first  moment  of  the  distribution  with  reference  to  the 
fixed  value,  which  may  be  termed  the  origin.     Similarly, 

is  called  the  second  moment,  Zx^f,  the  third  moment,  Ux^f,  the 
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fourth  moment,  and  so  on.     The  following  notation  will  be  found 
convenient  for  working  purposes  : — 


N\    Uxf 


=  -=r^,     V 


W^_Zx^f 


Undashed  letters  are  reserved  for  use  when  the  distribution  is  re- 
ferred to  its  mean  as  origin,  in  other  words  when  the  deviations  of 
the  X*s  are  measured  from  the  mean  X. 


Table  (40). 


Deviation. 

Frequency. 

First 
Moment. 

Second 
Moment. 

Third 
Moment. 

Fourth 
Moment. 

fr 

^V2 

^V2 

Totals  . 

N 

N', 

N'2 

N'3 

N'4 

Now  each  N  in  the  frequency,  table  is  the  sum  of  a  number  of 
discrete  quantities  which  only  tend  to  form  a  continuous  series  as 
the  class  intervals  are  made  very  small  and  the  number  of  observa- 
tions is  made  very  large.  The  corresponding  frequency  polygon 
or  histogram,  if  we  drew  it,  would  at  the  same  time  tend  to  become 
a  continuous  curve,  the  observation  curve.  If  that  Hmiting  stage 
were  attainable,  if  we  could  actually  get  an  infinitely  large  sample 
of  observations  in  which  the  character  observed  changed  by  infinitesi- 
mal amounts,  we  could  then  replace  the  isolated  /'s  of  observation 
by  the  corresponding  y^s,  the  ordinates  of  this  observation  curve, 
and  to  get  the  moments  we  could  write  instead  of  the  discrete 
sums 

2"/,  Uxf,  Ex^  ,  .  ., 

the  continuous  integral  expressions 

\y'dx,  jxy'dx,  jx^dx,  .  .  ., 

taking  in  the  whole  sweep  of  the  curve  by  integrating  throughout 
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the  range  of  deviation  x.  We  should  then  have,  if  areas  and 
moments  are  equated  according  to  Pearson's  method, 

jydx=jydx,  \xydx=jxy'dx,  jx^ydx=jx^y'dx,  .  .  .,jx^ydx==jx^y'dx, 

where  y  is  the  ordinate  of  the  fitting  curve  corresponding  to  the 
ordinate  y'  of  the  observation  curve. 

In  practice,  however,  it  is  impossible  to  go  to  this  limit :  we 
cannot  deal  with  an  infinitely  large  sample,  so  we  take  as  large  a 
sample  as  is  convenient,  calculate  the  rough  moments,  N,  N'^,  N'2 .  • ., 
and  find  approximately  what  corrections  or  adjustments  are  neces- 
sary to  obtain  the  moments  of  the  observation  curve,  a  procedure 
which  is  really  equivalent  to  the  determination  of  the  area  of  a 
curve  when  only  a  number  of  isolated  points  thereon  are  known. 

For  the  full  analytical  justification  of  the  method  of  moments 
the  reader  is  referred  to  Professor  Pearson's  original  paper,  On 
the  Systematic  Fitting  of  Curves  to  Observations  and  Measurements 
[Biometrika,  vol.  i.,  pp.  265  et  seq.  ;  also  vol.  ii.,  pp.  1-23],  where 
it  is  shown  that  '  with  due  precautions  as  to  quadrature,  it 
gives,  when  one  can  make  a  comparison,  sensibly  as  good  results 
as  the  method  of  least  squares.'  The  latter,  which  is  the  traditional 
way  of  approaching  all  such  problems,  is  shown  to  be  impracticable 
in  a  large  number  of  cases,  either  because  the  resulting  equations 
cannot  be  solved,  or,  when  they  are  capable  of  solution,  because 
the  labour  involved  would  be  colossal. 

Let  us  consider  next  how  to  deduce  the  area  and  moments  of  the 
observation  curve  from  the  statistics,  in  other  words  how  to  get 

jy'dx,  jxy'dx,  jx^dx,  .  .  ., 

the  integrals  being  taken  throughout  the  range  of  the  curve,  when 
we  Imow  the  frequencies  corresponding  to  only  a  certain  number 
of  values  or  elementary  ranges  of  the  deviation  x. 

Now  the  character  observed  may  be  capable  of  the  deviations 
actually  recorded  and  of  no  values  in  between,  e.g.  measuring 
deviations  from  '  no  rooms  '  as  origin,  we  might  have  /^  one-roomed 
tenements, /g  two-roomed  tenements, /g  three-roomed  tenements,  but 
there  could  be  no  such  thing  as  a  two-and-a-half  or  a  three-and-a- 
quarter-roomed  tenement ;  on  the  other  hand,  any  recorded  devia- 
tion, x^,  may  be  only  the  mid- value  (used  as  a  convenient  and 
concise  approximation)  of  a  group  of  observations  including  all  in 
the  continuous  range  from  (a;^— J)  to  (a^^+J),  where  unit  deviation 
is  the  class  interval :  thus  we  might  have  f^  males  deviating  by 
-j-6  in.  from  5  ft.  (comprising  all  the  males  observed  between  5  ft. 
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5J  in.  and  5  ft.  6J  in.),  /g  males  deviating  by  +5  in.  from  5  ft.  (com- 
prising all  males  between  5  ft.  4J  in.  and  5  ft.  5J  in.),  and  so  on. 
These  two  cases  must  be  discussed  separately. 

(1)  When  the  observations  are  centred  at  definite  but  isolated  values 
of  X. 

The  problem  is  to  find 

^x'^y'dx 

(the  Tith  moment)  when  we  have  no  definite  curve  given  but  we 
know  the  values  of  x  and  y'  at  a  number  of  isolated  points,  say 

This  is  equivalent  to  discovering  a  suitable  '  quadrature  formula,' 
i.e.  a  good  approximation  to 

\zdx 


O  /f  Wz  2  /z  3 

Fig.  (31). 


0  h\  hlhZ 


Fig.  (32). 


Ph 


in  terms  of  known  points 

(♦^0>  '^0)'   V**^!'  -^l)'   ('^'2'  ^2/'    •    •    •    \^p>  ^p)' 

where  we  have  written  z  in  place  of  x^y' ,  and  we  may  generally 
take  the  ordinates  to  be  at  equal  distances,  h,  apart.  Several 
such  formulae  have  been  suggested  and  they  vary  according  as  the 
2's  are  situated  at  the  ends  (fig.  (31))  or  at  the  centres  (fig.  (32)) 
of  the  h  intervals.  The  second  type  is  perhaps  the  more  useful  of 
the  two,  and  we  shall  work  out  one  formula  in  illustration  of  it.  . 
Consider  the  first  five  of  the  given  points,  namely, 

(^0>  '^0/'   ('^1'  ^l)j    •    •    •    (•^4j  ^4)-. 

As  a  simple  *  curve  of  closest  contact '  let  us  find  the  parabola  of 
type 

z=CQ-j-CiX/h-\-C2X^/Ji^-\-c^x^/h^-^CiXyh*         .         .     (1) 

which  goes  through  these  five  points,  where  the  c's  are  constants  to 
be  determined.     We  may  without  loss  of  generality  take  the  axis 
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of  z  to  coincide  with  the  middle  one  of  the  five  ordinates,  so  that 
the  known  points  on  the  curve  become 

(-2h,  zo),  (-h  2i),  (0,  zg),  (+/i,  zs),  (+2^.  «4), 
and  on  substitution  in  (1)  we  get 


2=0=^0— 2C1+4C2—8C3+I6C4. 


Za Cn 


Z4=Co+2Ci  +  4C2  +  8C3+16C4. 


-2h  -h     O  +A    +2>^ 

Fig.  (33). 


^1 — ^0       ^11^2       ^3   I   ^4* 
2^3 = Cq  4"  C^  +  C2 + C3 -j"  C4 . 

These  equations  are  just  sufficient 
uniquely  to  determine  the  c's,  and 
hence  the  parabohc  curve  of  closest 
contact,  in  terms  of  the  five  given 
points,  but  for  our  purpose  it  is  not 
necessary  to  find  all  the  c's.  Suppose 
our  object  is  to  find  the  area  of  the 
shaded  portion  of  fig.  (33)  in  terms 
of  the  co-ordinates  of  the  five  given 
points.     This  area 


+hl2 


zdx 


{CQ-\-Cix/h-i'C2X^/h^-\-CQX^/h^-\'C^x*/h^)dx 


=Co^+  cJb/U-\-  cJi/SO. 
But  the  equations  between  the  z's  and  c's  at  once  give 

Z2=Co>  ZQ-\-Z^=2{CQ+4:C2+lQCi),  2i  +  Z3=2(Co+C2+C4). 


Thus 


Therefore 


2C2+2C4  =  (Zi+Z3)-22;2 


24c2=16(2;i+Z3)-(Zo+Z4)-3022 

24C4  =  (Zo+Z4)-4(Zi+23)  +  6Z2. 

Hence,  by  substitution,  the  shaded  area  becomes 

2£?a;=^[z2+^F«-|l6(Zi+Z3)-(2o  +  Z4)-30«2| 

+TT^\(Zo+z^)-Hzi+z^)+Qz^\] 
=  A[5178z2-17(.o+^4)+308(Zi+.3)], 


£ 


h/2 


(2) 


/: 
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these  particular  ordinates  being  appropriate  when  the  axis  of  z 
coincides  with  the  z<^  ordinate. 
Similarly,  it  can  be  shown  that 

•+3A/2  }i 

■m  2^^=2i^272;o+172i+52;2-Z3]. (3) 

by  finding  the  parabolic  curve  of  closest  contact  through  (0,  Zq), 
(A,  Zi),  (2^,  Zg)'  (3^»  2:3),  the  axis  of  z  coinciding  now  with  Zq. 

cHv+m 

Now  we  require  /  zdx 

(see  fig.  (32)),  and  this  may  be  obtained  by  spHtting  up  the  integral 
thus 

/•+3A/2        /•6/1/2        nhl2  r(p-m        f(P+i)h 

+  +  +...+  + 

J~h/2  J3h/2        Jbhll  kv-i)^        \v-\)h 

and  applying  the  formulae  (2)  and  (3)  to  evaluate  these  sub -integrals. 
The  first  and  last  come  under  head  (3),  while  all  the  rest  come 
under  (2).  In  fact,  we  fit  together  portions  of  curves  of  parabolic 
type  based  on  the  successive  groups  of  points 

(0,  1,  2,  3),  (0,  1,  2,  3,  4),  (1,  2,  3,  4,  5),  (2,  3,  4,  5,  6),  .  .  . 
(p— 4,  p-3,  p—2,  p—\,  p),  (p— 3,  p—2,  p—l,  p), 

and  as  the  points  overlap,  in  the  sense  that  neighbouring  groups 
have  points  in  common,  the  curves  dovetail  into  one  another  and 
so  provide  a  fairly  good  approximation  to  what  we  want  in  the  way 
of  integral  expressions  giving  areas  based  upon  the  positions  of 
certain  known  points. 
We  have,  then  : — 


8A/2  h 

zdx=—[27zQ+llZii-5z2-z^'] 

-hl2  24 


5/1/2  }i 

r      =2:^^=^-;:^[5178z2-17(2:o+2J4)+308(2;i+2:3)] 
3A/2  57  bO 


6A/2  57  dO 


i: 


"'''=«i'^'^~[5n»z,-n{z^+Zt)+S08(z,+z,)] 

hji  57  dO 


^^=^^5n8zp.^-n{Zp.,-{-z^)+S0S(Zj^^+Zp.,)] 
(j)-i)h  5760 


zdx=~[21zp-\-llZp.;i^+5zp.2—Zp-^]. 
J(j>-i)h  24 
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f 


2(^a;=^^^^[6463;:o+4371Zi+666922+5537z3+6463z^ 


Hence,  by  addition, 

f(p+m  ,       h 

zdx= ^ 

•A/2  5760 

+4371z^_i4-6669v2+55372;^_3] 
=A[M220(Zo+s)+0-7588(Zi+Vi)+M578(z2+V2) 

+  0-9613(2;3+V3)+ (2=4+2^5+     .    .    .    +V4)]- 

In  effect,  since  z—x^y',  this  means  that  to  calculate  the  moments 
from  the  given  statistics  we  may  work  simply  with  the  observed 
ordinates  or  frequencies,  as  drawn  up  in  Table  (40),  so  long  as  we 
modify  the  first  four  and  the  last  four  by  multiplying  them  by 
suitable  factors.  In  particular,  when  the  frequencies  at  the  be- 
ginning and  end  of  the  distribution  are  very  small,  that  is  to  say, 
when  there  is  high  contact  at  each  end  of  the  frequency  curve, 
we  may  dispense  even  with  the  modifying  factors  also  since  we 
may  assume  that  before  the  first  and  after  the  last  ordinate  observed 
there  are  others  which  are  so  small  as  to  be  negligible. 

Thus,  given  high  contact  at  each  extremity  of  the  observation 
curve,  we  may  write 


/: 


zdx=h2Jz, 

-hii 


or,  if  we  take  the  class  interval  as  unit  in  measuring  x  so  that  h=l, 
this  gives 

jyx^dx=Zfx^, 

where  the  integral  may  now  be  taken  as  referring  to  the  fitted 
curve,  since  the  moments  of  the  theoretical  and  of  the  observa- 
tional curves  are  to  be  equal,  and  the  integration  traverses  the 
extent  of  the  curve.  When,  however,  there  is  not  high  contact  at 
the  extremities  the  same  equation  holds  good  if  we  multiply  the 
first  and  last  of  the  observed /'s  by  1-1220,  the  second  and  the  last 
but  one  by  0-7588,  the  third  and  last  but  two  by  1-1578,  and  the 
fourth  and  last  but  three  by  0-9613. 

In  particular,  when  7i=0,  integrating  throughout  the  curve, 

\ydx=i:f=^,  .  .  .     (4) 

which,  being  interpreted,  means  that  the  area  contained  between 
the  fitting  curve  and  the  axis  of  x  measures  the  total  frequency  of 
observations,  modified  if  necessary. 

Also,  when  the  observation  moments  have  been  adjusted,  if  we 
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write  /Lt  and  fju'  in  place  of  v  and  v  in  the  notation  previously  pro- 
posed (see  Table  (40)),  integrating  again  throughout  the  curve, 

\xydx/jydx=UxffN=fjL\,     .  .  •     (5) 

and  the  geometrical  interpretation  of  this  is  that  the  foot  of  the 
ordinate  passing  through  the  centre  of  gravity  of  the  area  between 
the  fitting  curve  and  the  axis  registers  the  deviation  of  the  mean  X 
from  the  fixed  origin. 

If  deviations  are  measured  from  the  mean  of  the  distribution 
as  origin  i7(a:/)  vanishes  (see  also  Appendix,  Note  (5))  so  that/>ti=0. 

Generally,  we  have,  with  the  same  limits  of  integration, 

jx^^ydx/jt/dx  ^Z'x^Z/N  =:/x'  „ , 

and  when  the  distribution  is  referred  to  its  mean  as  origin  the 
right-hand  side  is  written  /x„. 
We  now  pass  to  the  second  case. 

(2)  When  the  observations  appear  in  groups  ranging  between 
definite  values  of  x,  the  range  of  each  group  as  a  rule  being  the  same 
in  extent. 

Since  the  usual  procedure  here  is  to  treat  each  member  of  a  group 
as  though  it  were  centred  at  the  x  at  the  middle  of  that  group — 
e.g.  a  group  of  school  girls 
each  of  some  weight  be- 
tween 7  stone  and  7  stone 
5  lbs.  would  be  treated  as 
if  all  its  members  were  of 
weight  7  stone  2-5  lbs. — 
this  case  evidently  reduces 
to  that  already  considered. 
It  is  necessary,  however,  to 
examine  what  correction 
must  be  made  for  assum- 
ing that  all  the  members 
of  the  same  group  have 
the  same  x. 

Consider  again  the  expression 

jx^y'dx. 

The  contribution  to  the  nth.  moment  coming  from  the  Zj.  group  of 
observations  (see  fig.   (34))  may  be  taken  as  the  portion  of  the 

above  integral  between  limits  ' 


A 


Fig.  (34). 


a^o-fr^--)  and  [x^j^rh-^-]  where 
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Xq   is    the    distance    of   the   centre   of   the   first   group  from  the 
origin  0. 

But,  since  all  the  observations  in  the  same  group  are  treated  as 
if  they  had  the  same  x,  by  (2)  this  integral  may  be  written 

Mr 


5760' 


[5llS{x^-{-rhr-l'7{{x^-\-r-2hr-\-{Xo+r-}-2h)- 


H-308{(a;o+r-U)"+(a:o+r+Ur}], 
where  /^  is  the  frequency  of  observations  in  the  group,  and  this,  on 
expansion  in  powers  of  {xQ-]-rh)  and  h, 

^hMxo+rh)-+^[2^0n{n-lW(Xo+rhr-'^ 
57  bO 

+3n(n-l)(n-2){n-3)h^{xQ-\-rh)»-*+  .  .  .]. 
When  we  sum  for  all  groups,  the  expression 

X^"'hMxo+rh)- 

r«=0 

gives  evidently  the  nth  moment  of  a  set  of  isolated  variables, 
/o,  fi,  /g,  .  .  .  fp,  and  by  Case  (1)  it  may  therefore  be  taken  as 
being  practically  equivalent  to  the  required  nth  moment  of  the 
observation  curve,  assuming  that  there  is  high  contact  at  each  end  oj 
the  curve. 

The  remaining  terms, 

^^r^o  5760 

+Sn(n-l){n-2)(n-Z)hSxo-\-rh)^-^i-   .  .  .], 

may  accordingly  be  taken  as  the  correction  required. 

When  n=0,  these  terms  vanish,  so  we  infer,  just  as  in  Case  (1), 
that,  when  the  integration  is  taken  throughout  the  curve, 

j2/(Za:=27=N,     .         .         .     (4)  bis, 

or,  the  area  between  the  fitting  curve  and  the  axis  of  x  measures 
the  total  frequency  of  observations  when  the  class  interval  h  is 
treated  as  the  unit  in  measuring  x. 

Again,  when  n—l,  the  corrective  terms  vanish,  so  we  likewise 
infer,  as  in  Case  (1),  that,  with  the  same  limits  of  integration, 

jxydxljydx=I!xf/N=fjL\,     .  .  •     (5)  bis, 

and  that  jLti=0. 

When  n—2,  the  reduction  of  the  corrective  terms  gives 

h'^ 
second  unadjusted  moment = second  adjusted  moment-}- — ^M* 

1^ 


CURVE   FITTING  203 

or,  dividing  throughout  by  Ehf  and  bearing  in  mind  the  notation 
adopted  with  the  mean  as  origin, 

when  A=l  as  before. 
When  n=3, 

third  unadjusted  moment  =third  adjusted  moment -\-  —Zf^(xQ-{-rh) ; 

4 

but,  if  we  refer  the  deviations  to  the  mean  of  the  distribution  as 

origin,  Zf^{xQ-\-hr)  vanishes. 

Therefore,  i"'3=^3     •  •  •     (^) 

When  w=4, 
fourth  unadjusted  moment 

=fourth  adjusted  moment-j--— i^/r(a^o+^^)^H — ^M- 

2  80 

Hence,  dividing  through  as  before  by  Zhf  and  taking  A  as  1, 

Therefore,  /^4=^4-ii^2+2ib    •         •         '     (^) 

To  sum  up,  the  general  procedure  in  Case  (2)  is  to  calculate 
N,  N'l,  N'2,  N'g,  N'4  directly  from  the  statistics  and  so  deduce 
v'l,  v\y  v\,  v\.  Then,  transferring  the  origin  to  the  mean,  the  v"b 
become  vi,  V2,  v^y  v^  (see  Appendix,  Note  5),  and  finally  the  cor- 
rected /x's  are  given  by 

These  adjustments,  originally  due  to  Dr.  W.  F.  Sheppard  *  [Pro- 
ceedings of  the  Lond.  Mathl.  Socy.,  vol.  xxix.,  pp.  353  et  seq.],  are 
applicable  only  when  the 
curve  of  distribution  has 
high  contact  at  each  ex- 
tremity as  very  frequently 
happens.  To  this  case 
we  shall  confine  ourselves, 
and  when  it  does  not  hold 
the  unadjusted  moments 
may  be  used  as  a  rough  approximation  failing  a  more  refined  but 
also  a  more  intricate  adjustment. 

The  way  in  which  the  three  chief  kinds  of  average  are  related  to 

[*  To  obtain  Sheppard's  adjustments  we  have  followed  the  method  indicated 
in  Elderton's  Frequency  Curves  and  Correlation,  pp.  28,  29.  ] 
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the  fitting  curve  is  of  interest  and  deserves  recapitulation.     Whether 
the  observations  are  classed  as  in  Case  (1)  or  as  in  Case  (2)  : — 

(1)  the  ordinate  drawn  through  the  highest  point  of  the  curve, 

since  the  frequency  there  is  a  maximum,  fixes  the  modal 
value  of  X  ; 

(2)  the  median  X  is  determined  by  the  ordinate  bisecting  the 

area  between  the  curve  and  axis,  since  there  are  an  equal 
number  of  observations  on  either  side  of  it ;   and 

(3)  the  mean  is  determined  by  the  ordinate  through  the  centre 

of  gravity  of  the  area  between  the  curve  and  axis. 

We  have  still  to  show  how  to  express  the  constants  of  the  fitting 
curve  in  terms  of  the  moments  calculated  from  the  given  statistics,  and 
it  will  be  convenient  now  to  make  our  approach  from  the  other  end. 

Take  the  general  equation  of  the  fitting  curve,  express  its  con- 
stants in  terms  of  its  moments,  and  substitute  for  the  latter  the 
values  determined  from  the  statistics,  since  the  basis  of  the  fitting 
is  the  equalization  of  the  moments  of  the  observational  curve  and 
of  the  theoretical  curve.  This  will  enable  us  to  determine  k,  the 
criterion  for  fixing  the  type  of  curve  suitable  to  the  given  distribu- 
tion. When  the  type  has  been  fixed  it  is,  as  a  rule,  not  a  very 
difiicult  matter  to  express  the  constants  of  the  particular  type 
again  in  terms  of  the  observational  moments. 

Now  the  general  differential  equation  of  the  fitting  curve  was 

dy       y{x+b) 


dx    px^-\-qx-\-r 
hence 

j{px'^-{-qx-]-r)dy=jy(x-]-b)dx, 

where  the  integration  is  to  traverse  the  complete  curve. 

Therefore,  multiplying  both  sides  by  x^, 

j{px''+^-]-qx''+^-\-rx'')dy=j{yx''+^-\-byx'')dx; 

or,  if  we  integrate  the  left-hand  side  by  parts 


[{px''+^-}-qx''+'^+rx'')y']—jy{n-\-2px''+^-\-n-\-lqx''-{-nrx'>'-^)dx 
=j(yx^+^-{-byx^)dx. 

But  the  expression  in  square  brackets  vanishes  at  both  limits  if 
we  suppose  y  to  be  zero  at  each  end  of  the  curve,  so  that  the  equa- 
tion reduces  to 


{l-{-pn-^2)jyx''+^dxi-{b-\-qn-i-l)jyx''dx-\-rnjyx'^-'^dx=0,  ...    (9) 
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Now  if  deviations  are  measured  from  the  mean  of  the  distribution, 
we  have 

jyxdx—'NfjLi=0,  jyx^dx=l^fi2y  jyx^dx='NfjL^,  etc., 

and  therefore,  putting  n=3  in  the  above  relation, 

put  71=2,  (l+4i>)N/x3+(6+3g')N//,2=0 

put  71=1,  (1+3^)N/X2+^N=0 

put  71=0,  (6+g)N=0. 

Thus  b  =—q,  and,  on  substitution  in  the  other  three  equations,  we  get 
S/iiP + 3jLt3g+ 3/^2^ +/X4 =0, 

3/>t2P  +       r-\-fjio^=0, 

three  simple  linear  equations  to  find  p,  q,  r,  the  solution  of  which 
leads  to 

^  =  —  (2^2i^4— 3/x23— 6/a32)/(10/X2/X4— IS/A^g— I2/X23), 

^=-6  =  -/i3K+3ia'^2)/(10ia2/X4-lV2-12/x23), 

We  have  thus  expressed  p,  q,  r,  and  6,  the  constants  of  the  fitting 
curve  in  terms  of  the  moments  of  the  observed  distribution,  but  the 
results  may  be  rendered  more  concise  by  writing 

Pi=i^yi^\  P'z^i^ji^h,    •     •     •  (10) 

whence 

p=-(2^2-3i3i-6)/2(5ft-6ft-9),  ....  (11) 

g=-6=-V(/x2ft).(i32+3)/2(5^2-6ft-9),       .         .  (12) 

r=-^2(4i82-3ft)/2(5ft-6^i-9)      .         .        ,         .  (13) 

And  Ky  the  criterion  for  fixing  the  type  of  curve  suitable  to  the 
statistics  given,  is  immediately  deduced  from 
K  =q^l^pr 

=A(iS2+3)V4(4ft-3A)(2^2-3ft-6)       .         .         .     (14) 

Also,  since  ~  vanishes  when  x  =  —  b,  this  fixes  the  mode  relative 
dx 

to  the  origin.  '  But  the  origin  is  now  at  the  mean,  so  that 
mode-mean=-6=- V(/^2iSi)  •  {p2+^)m^p2-^Pi-^)    (15) 

And 

skewness = (mean— mode)/S.D. 

=6/V(M2) 

=Vi3i(i38+3)/2(5;8j-6;8i-9)     .         .         .     (16) 
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APPLICATIONS    OF    CURVE    FITTING  [ 

We  are  in  a  position  now  to  test  the  application  of  these  principles  ' 
to  given  frequency  distributions  and  we  shall  start  by  trying  to  ] 
find  a  curve  to  fit  the  record  of  marks  obtained  by  514  candidates 
in  a  certain  examination  (see  p.  25). 

Example  (1). — This  example  is  chosen  because  it  turns  out,  ' 
when  we  come  to  evaluate  k,  that  it  is  well  fitted  by  the  normal  ^ 
curve,  Type  VII ,  which  is  one  of  the  simplest  and  at  the  same  time  ; 
the  most  important  of  all  the  types  discussed.  Before  we  start  ' 
the  numerical  part  of  the  work  it  will  be  well  to  express  the  [ 
constants  2/0  ^^d  a  of  this  curve  in  terms  of  the  moments  of  the  ! 
distribution.  ! 

The  equation  of  the  normal  curve  is  J 

If  N  be  the  total  frequency,  we  have  by  equation  (4)  bis,  p.  202,  j 

f+co  \ 

N=        ydx 

J-co  \ 

/'+03  '■ 

=2/0        e-*'/2a'^a.,  j 

dx  — 

Put  x72(j2=^^  so  that  —=(J^/2  and  when  a;=oo,  f =00  also.  j 

di  J 

\ 
Thus  N=2/o<^V'2f^%-^'(Zf  j 

J -co  j 

I 

=yocrV2V'7T  (see  Appendix,  Note  8)  .' 

=V(27r)cryo      ...     (1)  \ 
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Again 


r+oo  /  f+co 

^2=        yxHx  yds 

J  -co  I    J  -co 


2  V2  .  (j^i/o 


N 


[Mr-f-"<^] 


2\/2  .  ggyp    Vtt 
'  '    2  ' 


N 


(«•-): 


vanishes  at  both  Limits. 

fi^  =  V2.  (72/0 Vi  •  ctVN-ct^,  by  (1). 


since 

Therefore 

In  fact,  a  is  simply  the  S.D.  of  the  distribution. 
And  yo=N/\/(2^).cr. 

Table  (41).  Distribution  of  Marks  obtained  by  514  Candi- 
dates  IN  A  CERTAIN   EXAMINATION. 


Mean  No. 

of 

Marks. 

Deviation 

Frequency 

of 
Candidates. 

First 

Second 

Third 

Fourth 

from  33. 

Moment. 

Moment. 

Moment. 

Moment. 

(^) 

.(/) 

ifx) 

ifx') 

if^) 

ifx') 

3 

-6 

5 

-  30 

180 

-1080 

6480 

8 

-5 

9 

-  45 

225 

-1125 

5625 

13 

-4 

28 

-112 

448 

-1792 

7168 

18 

-3 

49 

-147 

441 

-1323 

3969 

23 

-2 

58 

-116 

232 

-  464 

928 

28 

-1 

82 

-  82 

82 

-     82 

82 

33 

87 

38 

+  1 

79 

+  79 

79 

+     79 

79 

43 

+2 

50 

+  100 

200 

+  400 

800 

48 

+3 

37 

+  111 

333 

+  999 

2997 

63 

+4 

21 

+  84 

336 

+  1344 

5376 

58 

+5 

6 

+  30 

150 

+  750 

3750 

63 

+6 

3 

+  18 

108 

+  648 

3888 

— 

— 

614 

-110 

2814 

-1646 

41,142 
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The  first  4  moments  referred  to  33  as  oriein  and  with  the  class    ' 

interval,  5  marks,  as  unit  of  deviation,  are  i 

-110/514,  2814/514,  -1646/514,  41142/514.  I 

The  arithmetic  mean  of  the  distribution  j 

=:33H-5(-^if)  ! 

=33-5(0-214008)  j 

=31-92996.  I 

The  second,  third,  and  fourth  moments  referred  to  the  mean  as  . 

origin,  and  retaining  five  marks  as  unit  of  deviation,  are  given  | 

(see  Appendix,  Note  5)  by  i 

1/2=2814/514-^2^5-42891 
j,3=_1646/514-3^i/2-^3_0-29296 
z/4=41142/514-4:ri/3-6:c2i;2-^*^78-79964. 

After  making  Sheppard's  adjustments  i 


/^2  — ^2~T2j  /^3  — ^3'  /^4  —  ^4~4''2+ 2^4  0  5 


these  become 


/x2=5-34558,  /x3=0-29296,  jLt4 =76- 11436.  ] 

Thus  j8i=/i23/^3^ =0-00056,  jSg^/x^/^^^  =2-66365.  ] 

Hence  «=ft(iS2+ 3)^/4(4^2- %)(2i82-%- 6)  | 

=  (0-00056)(5-66365)2/4(10-65292)(-0-67438)  \ 

=-0-00063. 

Since  k  and  p^  are  small  and  jSg  does  not  differ  greatly  from  3,  making  ; 

p  and  q  small,  we  may  fit  a  normal  curve  to  this  distribution. 

The  appropriate  normal  curve  is         .  ^ 

2/=2/oe-^/2«.2,  . 

where      (t2=/x2 =5-34558  (5  marks  as  unit),  j 

2/o=N/V(27rf>t2)=514/\/2^(5-34558)^=88-6903.  I 

Hence  the  required  curve  has  for  its  equation,  writing  results  to 
three  significant  figures, 

j 
Now  the  mean  of  the  distribution  is  at  31-92996,  where  the 

central  ordinate  of  the  normal  curve  is  erected,  and  the  distance  i 

of  any  x,  say  x^^,  from  this  point  J 

=(33-31-92996)/5  (expressed  with  5  marks  as  unit)  ; 

=0-214008.  . 
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Vny  other  x  may  be  found  in  the  same  way  and  y  can  then  be 
deduced  from  the  equation  of  the  curve  by  taking  logs,  thus 

log>o2/=log.o88-6903-^-^^bg,„e 

=1-9478762- (0-0406218)a;2. 

This  enables  us  to  calculate  the  ordinates  of  the  normal  curve  and 
thence  we  could  evaluate  the  areas  by  successive  applications  of  a 
suitable  quadrature  formula. 

We  can,  however,  get  the  areas  direct  by  using  a  table  of  the 
probability  integral,  such  as  that  due  to  Dr.  W.  F.  Sheppard  (see 
pp.  284,  285).  In  that  case  the  corresponding  abscissae  have  first 
to  be  expressed  in  terms  of  the  standard  deviation  as  unit,  e.g. 

a;4o.5=40-5-31-92996=8-57004, 
and  (7=5^/(5-34558) =11-56025, 

where  the  factor  5  is  introduced  because  5  marks  was  the  unit  in 
the  calculation  of  /Xg  (a  process  equivalent  in  effect  to  that  previously 
adopted). 

Thus    a;4o.5/cT=0-741336 
=$,  say. 

The  area  of  the  normal  curve  up  to  the  abscissa  x/a  or  $ 
=  1     ydx 

J  -co 

=  r  yoe-''''''^''''dx 

J  -co 

J  -CO 

=nP  zdi 

J  -co 

=N  .  i(l+<x), 

where  -  represents  the  area   of  the  curve  z= — =e~^^-   between 
2      "^  V2,7 

0  and  £. 
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Sheppard's  Tables  give  the  values  of  J(l+a)  for  different  values 
of  f ,  and  when 

^=0-74,  1(1 +a)  =0-7703500 
^=0-75,  i(l+a)==0-7733726. 

Therefore,  by  interpolation,  when 

^=0-741336,  1(1 4-a)  =0-7707538. 

Thus  the  frequency  of  candidates  with  marks  lying  between  0  and 
40-5 

=514(0-7707538)  =396-17. 

Similarly  the  frequency  of  candidates  with  marks  l3dng  between 
0  and  45-5=452-20. 


othlGO 
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Fig.  (35). 


50 


60  70 


Hence  the  normal  frequency  for  the  group  with  43  as  mean 
number  of  marks  =56-0,  and  the  same  method  gives  the  area  for 
any  other  group. 

The  histogram  of  the  observations  and  the  curve  plotted  from  the 
ordinates  are  shown  together  in  fig.  (35). 

In  Table  (42)  are  set  out  the  calculated  normal  frequency  (col.  (4)) 
for  each  group  alongside  the  corresponding  observed  frequency 
(col.  (2)),  and  the  differences  between  the  two  are  shown  in  col.  (5). 
We  want  to  know  whether  the  fit  is  a  good  one. 
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(1) 


Table  (42).  Comparison  of  Observed  and  Normal 
Frequencies  in  Examination  Example. 

(3)  (4)  (5)  (6)  (7) 


(2) 


Mean  No. 

Normal  Frequency. 

Ratio  of  No. 

of 
Marks. 

Observed 
Frequency. 

Deviation. 

Sq.  of 
Deviation. 

in  Col.  (6)  to 
No.  in  Col.  (4). 

Ordinates. 

Areas. 

3 

5 

3-9 

5-7 

+0-7 

0-49 

009 

8 

9 

10-4 

10-7 

+  1-7 

2-89 

0-27 

13 

28 

23-2 

23-5 

-4-5 

20-25 

0-86 

18 

49 

429 

431 

-5-9 

34-81 

0-81 

23 

58 

65-8 

65-6 

+  7-6 

57-76 

0-88 

28 

82 

83-7 

83-1 

+  11 

1-21 

0-01 

33 

87 

88-3 

87-6 

+0-6 

0-36 

0-00 

38 

79 

77-3 

76-8 

-2-2 

4-84 

0-06 

43 

50 

561 

560 

+  6-0 

36-00 

0-64 

48 

37 

33-7 

340 

-3-0 

900 

0-26 

53 

21 

16-8 

171 

-3-9 

15-21 

0-89 

58 

6 

7-0 

7-2 

+  1-2 

1-44 

0-20 

63 

3 

2-4 

3-5 

+0-5 

0-25 

0-07 

•• 

514 

511-5 

513-9 

•• 

184-51 

X2=5.04 

Now  with  this  object  we  might  square  each  difference  as  in 
col.  (6),  sum  the  squares,  and  find  the  mean  square  deviation  by- 
dividing  by  the  total  frequency ;  this,  after  extracting  the  square 
root,  would  give  what  might  be  called  the  root-mean-square  error, 
regarding  the  theoretical  values  as  the  true  ones.  In  the  above 
example  it 

=V(184-51/514)  =0-599. 

But  this  form  of  result,  while  it  may  be  useful  in  some  cases, 
e.g.  in  comparing  two  distributions  of  the  same  kind  to  some 
theoretical  series,  is  open  to  objection  ;  for  one  thing  it  treats  all 
the  differences  as  if  they  were  of  equal  importance  in  absolute 
magnitude,  but  a  difference  of  2,  say,  in  a  normal  frequency  of  10 
is  clearly  more  serious  than  a  like  difference  in  a  frequency  of  60. 
The  objection,  however,  goes  deeper  than  that ;  even  when  the 
root-mean-square  deviation  is  found  we  are  at  a  loss  to  estimate 
its  precise  relationship  to  the  quality  of  fit,  as  there  seems  to  be  no 
definite  connection  between  one  distribution  and  another  of  a 
different  kind  :  there  is  no  standard  case,  so  to  speak,  to  which  we 
can  always  appeal,  where  the  fit  is  agreed  to  be  good  and  supplying 
therefore  a  suitable  root-mean- square  deviation  for  comparison. 
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This  leads  us  to  the  question  :  What  constitutes  goodness  of 
fit  ?  Suppose  by  some  means  we  have  selected  a  theoretical  or 
empirical  formula  to  describe  a  certain  frequency  distribution  in  a 
given  population  ;  if  the  frequency  values  observed  do  not  differ 
from  the  theoretical  frequencies  by  more  than  the  deviations  we 
might  expect  owing  to  random  sampling,  then  clearly  the  fit  may  be 
regarded  as  a  good  one.  And  we  have  a  measure  of  the  fit  if  we 
can  find  the  proportion  of  random  samples,  of  the  same  size  as  the 
given  distribution,  showing  greater  deviations  from  the  distribu- 
tion given  by  theory  than  those  which  are  actually  observed. 

Now  Professor  Karl  Pearson  has  shown  how  this  proportion  can 
be  calculated  [Phil.  Mag.,  vol.  1.,  pp.  157-175  (1900)]  ;  he  finds  the 
probabiHty  that  a  random  sample  should  give  a  frequency  distribu- 
tion differing  from  that  which  theory  proposes  by  as  much  as  or  by 
more  than  the  distribution  actually  observed.  This  probability,  P, 
is  a  function  of  ^,  where 

y  and  y'  representing  the  theoretical  and  observed  frequencies  for 
any  particular  group  and  the  summation  is  to  include  all  groups. 
It  will  be  noted  that  this  expression  gives  each  difference  {y—y') 
its  appropriate  importance  by  relating  it  to  the  frequency  y  of  its 
own  group. 

A  table  in  Biometrika  (vol.  i.,  pp.  155  et  seq.)  gives  the  values  of  P 
corresponding  to  different  values  of  ^^  (including  all  integral  values 
from  1  to  30)  and  to  values  of  n' ,  the  total  number  of  frequency 
groups,  from  3  to  30  (see  also  p.  285).  The  mathematics  in- 
volved in  finding  P  is  difficult,  and  the  reader  who  wishes  to  enter 
into  it  must  consult  the  original  memoir,  but  the  utiUty  of  the 
function  has  been  proved  by  experience  and  it  is  readily  applied 
in  a  particular  case. 

In  the  above  example  ^-  is  found  from  col.  (7)  :  it  equals  5-04, 
and  from  the  table  of  values  of  P,  when  ti'  =13,  we  have 

P=0-957979  when  x^=^^ 
and  P =0-916082  when  x^=^' 

Therefore,  by  proportional,  interpolation,  when  '^^=5-04:, 
p  =0-956303.  Thus,  supposing  our  data  to  follow  the  normal  curve, 
in  956  random  samples  out  of  1000  we  should  expect  to  get  a 
worse-fitting  distribution  than  that  given  by  the  sample  actually 
observed.  We  may  therefore  conclude  without  hesitation  that 
the  normal  curve  provides  an  excellent  fit  in  this  particular  instance. 
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We  pass  on  now  to  fresh  distributions  to  illustrate  some  of  the 
other  types  of  frequency  curve. 

Example  (2)  deals  with  the  percentage  of  trade  union  members 
unemployed  at  the  end  of  each  month  for  the  years  1898  to  1912 
[data  from  the  Sixteenth  Abstract  of  Labour  Statistics  of  the  United 
Kingdom,  Cd.  7131].  Table  (43)  shows  the  distribution  of  the 
180  records  according  to  the  percentage  unemployed. 

The  deviations  are  measured  from  the  centre  of  the  group  (3-9— 5-2) 
as  origin,  and  the  class  interval  (1-3  per  cent.)  is  taken  as  unit  of 
deviation  as  usual. 

The  first  four  moments  are  : — 


I.e. 


-29/180(=:c),  425/180,  397/180,  3053/180  ; 
-01611111,  2-3611111,  2-2055556,  16-9611111. 


Table  (43).  Distribution  of  Unemployed  Percentages 
OF  Trade  Union  Members 


Percentage 

Devia- 

Fre- 

First 

Second 

Third 

Fourth 

Unemployed. 

tion. 

quency. 

Moment. 

Moment. 

Moment. 

]Moment. 

0— 

-3 

0 

0 

0 

0 

0 

1-3— 

-2 

33 

-66 

132 

-264 

528 

2-6— 

-1 

57 

-57 

57 

-   57 

57 

3-9— 

. , 

41 

. . 

, . 

5-2— 

+  1 

24 

4-24 

24 

+  24 

24 

6-5— 

+2 

10 

+  20 

40 

+  80 

160 

7-8— 

+3 

11 

+  33 

99 

+  297 

891 

91— 

+4 

3 

+  12 

48 

+  192 

768 

10-4— 

+  5 

1 

+  5 

25 

+  125 

625 

•• 

•• 

180 

-29 

425 

+  397 

3053 

Referred  to  the  mean, 

4-55+ l-3:r =4-3405556, 

the  second,  third,  and  fourth  moments  are  (see  Appendix,  Note  5), 
i.2=^2-3611111-a|2=2-3351543, 
i/3=2-2055556-3:fc'i/2-x3=3-338395, 
v^=lQ'96lUn-^xPs-ex^V2-x^=lS-1^8ll. 

Owing  to  the  very  doubtful  contact  at  the  beginning  of  the  curve 
Sheppard's  adjustments  were  not  made  in  this  case,  but  the  rough 
moments  as  calculated  above  were  used. 
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Thus  ^1 = vyv""^ =0-875242 

j32=i;4/z/22=3-43817 
and  Ac=ft(ft+3)V4(4i82-3i3i)(2ft-3j3i-6)=-0-466. 

Since  k  is  negative  the  fitting  curve  should  be  of  Type  /.,the  equation 
of  which  is 

where  mja^=m2la^,  and  (a^-\-a<^—h,  say. 

It  is  therefore  necessary  before  going  further  to  determine  ?/o,  a^, 
ag,  h,  m^  and  m^  in  terms  of  v^,  v^,  v^,  or  jS^  and  jSgj  the  constants  of 
the  distribution. 

The  value  of  2/0  is  found  to  be  most  conveniently  expressed  as  a 
Gamma  function  which  is  defined,  with  the  usual  notation,  thus  : — 

whence  it  follows  that  T{lc-\-\)=kT{k).      [See  Appendix,  Note  9, 
also  p.  285.] 
Also,  if 


B(m,  n)=j^  x^-^  {\-xY-Hx, 


it  may  be  easily  shown  that 

B(m,  n)=T(m)T(n)IT(m-[-n).    [See  Appendix,  Note  9.] 

The  general  method  of  procedure  in  determining  the  constants 
for  all  the  different  types  is  : — 

1.  Express  the  fact  that  the  area  of  the  curve  is  a  measure  of 

the  total  frequency  of  the  distribution — this  enables  us  to 
find  2/0. 

2.  Find  the  71th  moment  of  the  curve  with  regard  to  some  fixed 

origin — giving  n  particular  values,  1,  2,  3,  4,  this  leads  to 
the  determination  of  /Xg,  ix^,  fi^,  pi,  ^2  i^  terms  of  the  con- 
stants of  the  curve,  and  thence  to  formulae  for  calculating 
the  constants. 

Once  found,  the  same  formulae  may  be  used,  of  course,  in  all 
cases  of  the  same  type  :  we  have  only  to  replace  letters  by  the 
numbers  for  which  they  stand. 

Applying  this  method  to  the  Type  I.  curve,  we  have 

•+aa 

=  /      ydx 

2/0       '^'' 


-/.: 


«l""«; 
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Put     (ai+a;)=(%+«2K     so    that     (a2— ^)=(^i+«2)(l~25)     and 

dx 

— ={ai-\-a2)=b  ;  therefore 

dz 


^^.o6K+.2r-"YV.(i_,)^,,      ...     (2) 
a/^'a^"^  Jo 


B(mi+1,  m2+l). 


Hence  yA=-  . ; 


■  +  «2 


/  -r«2 

Again,     N/x'„=/      ?/(ai+a:)«(Zir 

is  the  nth  moment  of  the  distribution  referred  to  (— a^,  0),  the 
point  where  the  curve  starts  from  the  axis  on  the  left-hand  side, 
as  origin. 

Therefore,  as  above, 

a^^a^"^  Jo 

=6«nJ  2™i+^(1-2)"WJ  z'''^{l-z)'^dz,  by  (2). 

Hence, 

^'«=6«r(Wi+n+l)rK+m2+2)/r(mi+l)r(mi+W2+w+2) 

=b^(mj^-\-n)(mj^-\-n—l).  ...  (Wi+l)/(mi+m2+w4-l)(mi+m2+n) 
.  .  ..(mi+m2+2), 
by  repeated  appUcation  of  the  relation  r(k-\-l)=kr(k). 

Putting  n=l,  2,  3,  4  in  succession,  we  have 
/x'i=6K+l)/K+m2+2), 
^'2=62(mi+2)K+l)/(mi+m2+3)(mi+m2+2), 
/Lt'3=63(^^_^3)(^^_^2)(mi+l)/K+m2+4)K+m2+3)(mi+m2+2), 
^'^=64(^j+4)(mi+3)(mi+2)(mi+l)/K+m2+5)K+m2+4) 
(mi+m2+3)(mi+m2+2). 
These  relations  are  rendered  more  concise  if  we  write 
mi-\'l==m\,  m2-\-l==m'2,  m^-^m2-\-2=r ; 
thus        fjL\=bm'Jr 

^\=b^m\(m\+l)/r{r+l)  > 

/x'3=63m'i(m\+l)(m\+2)/r(r+l)(r+2) 

/x'4=6*m'i(m\+l)(m\+2)(m\+3)/r(r+l)(r+2)(r+3). 
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To   get  the   corresponding  moments  referred  to   the   mean  as 
origin  we  have  the  relations  : — 

/ii=0,  /X3==/x'3— 3/XV2— /^'\» 

H'2=H''2—H''\>  /^4=/^'4— Vl/^3— W2— /^'^> 

which,  after  some  straightforward  reduction,  give 
/i3=263m\m'2(m'2-m'i)/r3(r+l)(r+2) 

Thus      B  =u^  /a3  _'^b^rn'\m'^(m\-m\)^  i¥m\m'\ 
HI    /*3/A*2  r6(r+l)2(r+2)2       /  r^r+l)^ 

=4(m'2-m\)2(r+l)/m'im'2(r+2)2 

Therefore,        ^^      ^ft(^+2)^  ...     (3) 

m\m\     4(r+l)  ^  ^ 

Again      8  =a  /a2  _36^^\^^2Ki^^2(^-6)+2r-i  ibhn^^^m^ 
'     ^'    '^^^^  r*(r+l)(r+2)(r+3)        /  r*(r+l)2 

3[m'im'2(r-6)+2r2]       (r-fl) 


m'im'2  (r+2)(r+3) 

Therefore,    -JTL  =-.r+6+^J^±')^-^       .         .         .     (4) 
m\m\  3(r+l)  ^  ^ 

Combining  (3)  and  (4),  2ft(r+2)^  _  (r+2)(.+3) 

whence  r=6(i8,-^,-l)/(3ft-2ft+6)  .         .     (5) 

Again,  since     iJL2=b^m\m' 2/r^{r-\-l), 
therefore  62=^2(^+1)  •  lj8i(r+2)24-16(r+l)]/4(r+l),  by  (3), 

i.e.  b=-y;W[ft(r+2)^+16(r+l)]         .         .     (6) 

And  m\m'2=4r2(r+l)/08i(r+2)2+16(r+l)], 

while  m' i-\-m'  2=r ;  hence  m'j  and  m'2  are  roots  of 

^2_^^   I    V — \ /_ _Q 

^^i(r+2)2+16(r+l) 

the  solution  of  which  quadratic  is --hi    /   r-— ^  ^^"^  ^ ; 

2     WL       i8i(r4-2)24-16(r+l)J' 
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therefore,  m^  and  mj*  are  respectively  equal  to 


C' 


and  a^  and  ctg  follow  from 


(7) 


(8) 


nil    nig    mi+nia 

Applying  these  formulae  to  the  '  unemployed  '  example,  we  find 

r=5-36048.  mj  =0-169185.  m2=3-191295. 

6=9-33236.  ai=0-469842.  a2=8-86252. 

Also  2/0=58-1282,  and  the  equation  of  the  curve  is  therefore 

0169  /  «.       \  3-19 


y=58-l(l+-^)       (1- 
0470/ 


8-86 


The  position  of  the  origin,  which  is  at  the  mode,  is  given  by 


-<--- 


(mean-mode)  =/x'i—% 

_bm\        bnii 
r       mi-\-m2 
■m\    m\—V 


\  r        r—2  I 


m 


m 


r{r-2) 

V.r-2' 

mode  =4-3405556- i  .  -^  .  !^, 
Vi    r—2 

in  this  particular  case, 

=2-3052009. 


(9) 


thus, 


^  [*  When  fx^  is  positive  Wg  goes  with  the  positive  root  of  the  quadratic,  and 
vice  versa.} 
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This  enables  us  to  write  down  any  x,  and  thence  y  by  substituting 
for  X  in  the  equation  of  the  curve,  which,  by  taking  logs,  may  be 
written 

log  y=\o%  yo-^m^  log  ( l+^j+mg  log  ( 1— - 

e.g.  for  the  x  of  the  group  (2-6— 3-9),  bearing  in  mind  that  1-3  is  the 
unit  of  measurement  for  x,  we  have 

^3-25=(3-25-2-3052009)/l-3=0-9447991/l-3. 
Hence  ("1+^^^^  =2-546835  ;  (^1 -^^^=0-9179953  ; 

mj  log  ^+?i:^'^j  =0-0686892  ;  m^log  ("l-?^^  =  -0-118587  ; 

so  that         log  2/=l-714489, 
and  y  ^.2^=51-82. 

Similarly  the  ordinates  at  the  centre  points  of  the  other  groups 
may  be  calculated,  but  it  must  be  remembered  that  the  resulting 
values  are  only  a  first  approximation  to  the  observed  frequencies, 
and  a  better  series  is  obtained  if,  by  using  some  good  quadrature 
formula,  we  calculate  the  areas  for  the  successive  groups  between 
the  curve,  the  bounding  ordinates,  and  the  axis  of  x.  Indeed  in 
the  case  of  the  group  (1-3—2-6)  it  is  essential  to  do  this,  because 
(1)  the  rise  of  the  curve  is  so  very  abrupt  as  to  render  the  deter- 
mination of  the  single  ordinate  at  the  centre  quite  inadequate  for 
an  accurate  measure  of  the  frequency  in  that  group,  and  (2)  a 
portion  of  the  group  falls  outside  the  range  of  the  curve  which  only 
starts  at  1-6944063  {i.e.  mode— l-3ai),  and  this  has  to  be  allowed 
for  in  finding  the  frequency  as  represented  by  the  area  between  the 
curve  and  axis. 

The  base  of  the  required  area,  range  (1-6944063  to  2-6),  was 
therefore  divided  into  eight  equal  parts  and  the  ordinates  at  the 
points  of  division  were  determined.  The  area  was  then  found  by 
using  Simpson's  weU-known  formula  : — 

Area=P[(i/o+2/2p)+2(2/2+2/4+  •  •  •  +2/2p-2)+4(2/i+2/3+  . .  •  +y2p-i)l 

where  h  denotes  the  length  of  one  of  the  equal  parts  into  which 
the  base  is  divided  and  2p  is  their  number  ;  in  our  case  p=4:  and 
h=^,  the  class  interval  being  the  unit,  and  the  result  is  to  be 
reduced  in  the  ratio 

0-9055937  :  1-3 
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in  order  to  allow  for  the  smaller  range  of  this  group  ;   we  thus  get 
as  the  area  for  the  group 

A.QQKKQO'T'         1 

—5^3— X  ^^[(2/0+2/8) +2(2/24-2/4+2/6)+4(2/i+2/3+2/5+2/7)]  =37.39. 

The  observed  and  calculated  frequencies  for  the  whole  series  are 
compared  in  Table  (44),  the  remaining  areas  in  col.  (4)  being  calcu- 
lated by  the  simpler  but  somewhat  less  accurate  form  of  Simpson's 
formula,  when  only  three  ordinates  are  used,  namely, 


/. 


+1 


2/<^^=i(2/-i+42/o+2/i). 


Table  (44).  Comparison  of  Observed  and  Theoretical 
Frequencies  of  Unemployed  Percentages 

(1)  (2)  (3)  (4)  (5)  (6)  (7) 


Percentage 
Unemployed. 

Observed 
Frequency. 

Theoretical  Frequency. 

Deviation. 

Square  of 
Deviation. 

Ratio  of  No. 
in  Col.  (6)  to 
No.  in  Col.  (4). 

Ordinates. 

Areas. 

1-3— 
2.a- 
3-9- 
5-2— 
6-5— 
7-8— 
91— 
10-4— 

33 
57 
41 
24 
10 
11 
3 
1 

55-3* 

51-8 

37-8 

24-9 

14-8 

7-7 

3-3 

10 

37-4 

51-6 

37-8 

250 

14-9 

7-8 

3-4 

1-2 

+4-4 

-6-4 
-3-2 

+  10 

+4-9 

-3-2 

+0-4 

+0-2 

19-36 

2916 

10-24 

100 

24-01 

10-24 

0-16 

004 

0-52 
0-57 
0-27 
0-04 
1-61 
1-31 
0-05 
0-03 

•• 

180 

•• 

1791 

•• 

•• 

X*=4-40 

To  test  the  goodness  of  fit  we  have  n^=S,  ^2—4.49^  whence,  by 
means  of  the  P  table,  P =0-731852.  Thus,  roughly,  we  may  say  that 
three  out  of  every  four  random  samples  of  180  records  would  give  a 
worse  fit  with  the  proposed  curve  than  is  given  by  the  actual  distribu- 
tion observed,  so  that  the  fit  may  be  regarded  as  quite  a  reasonably 
good  one.  This  conclusion  is  also  supported  by  an  examination  of 
the  curve  which  has  been  drawn,  fig.  (36),  with  the  histogram  of 
the  given  statistics. 

Example  (3). — The  data  for  this  example  concerning  infectioils 
diseases  will  be  found  in  Table  (16),  p.  62  (or,  see  p.  224) ;  the 
reader  should  work  out  the  moments  for  himself  and  verify  the 
following  results  : — 

[*  The  ordinate  in  this  case  cannot  be  accepted  as  an  approximation  to  the 
frequency  given  by  the  curve.] 
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The  first  four  moments  referred  to  7  as  origin  are 

0-282158,     4-86307,     17-4855,     129-394. 
Referred  to  the  mean,  7-564316,  the  three  latter  become 
7^2=4-78346,     1/3=13-4140,     7/4=111-964. 

If  we  do  not  assume  high  contact  at  the  terminals,  and  certainly  at 
the  lower  end  it  is  doubtful,  we  deduce  from  the  above  values  of 
the  moments  that 

jSi=l-64396,    j32=4-89321,     a^=-1-53. 

Thus  the  fitting  curve  is  of  Type  I.  and  its  constants,  when  calcu^ 
lated,  are 

r=ll-7819.      mi=:0-31171.      m2=9-47020. 


ai=0-79216. 


a2=24-0671.       2/o=60-363. 


Dw ---r 

t^'^z : 

:           i  +      s^            __      . 

en  --I t-t S^-    

50  __- _jL_^    _          ^                          __ 

T     '                -S               _.          . 

+ K 

40 T -^  =  -" 

Qn -_:  --  .  --tiz a__:!:_ _ 



30 —             IT —  J    ::?: 

±  :         -  4         ^v 

s 

nn       I "• 

s^                  -                   -                   .       ...           .       .. 

:::  :  i^s":  :  ::::::::::::::  :: 

IQ --I -, 

"■^^ 

"=  .^ 

:::::::::  :::ffi:::::::j:::::: 

0            1          2   1    3         4^5 

6          7          8          9          10         11         12 

Percentage  Unemployed 
Fig.  (36). 

The  equation  of  the  curve  is  therefore,  retaining  three  significant 
figures  throughout, 

y=60.4(l+JLy7i_JL)'-. 
\      0-792/       \      24-1/ 

The  curve  starts  at  2-02904  (so  that  the  first  group  of  observations 
lies  whoUy  outside  its  range)  and  ends  at  51-7475.  It  is  drawn, 
together  with  the  corresponding  histogram,  in  fig.  (37). 

Supposing,  just  for  the  sake  of  comparison,  we  assume  high 
contact  at  the  terminals  and  attempt  to  fit  the  given  distribution 
with  a  Type  III.  curve,  to  which  Type  I.  is  closely  related. 

We  then  have,  after  making  Sheppard's  adjustments, 
;x2=:4-70013,      /X3=13-4140,      ix^=l0d-60l, 
whence       j3i=l-73295,      ^2=4-96129,       a:=-1-47. 

It  will  be  noted  that  the  theoretically  correct  type  to  take  here 
again  is  Type  I.,  but  this  was  discarded  because,  when  attempted, 
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it  led  to  a  curve  starting  at  a  point  corresponding  to  a  disease  rate 
of  3-385,  so  that  the  central  ordinates  of  each  of  the  first  two 
observed  groups  lay  outside  the  curve  altogether. 

Type  III.  curve  is  of  the  form 


y^y^e-v^  1+ 


70 


■eso 


S40 


30 


CO  20 


% 


% 


I 


;Pype 


I 


Xypel-II 


^i 


't^ 


ro 


10 


15 


20 


25 


30 


Disease  Rate  per  1000  persons  liuing 
Fm.  (37). 


To  express  the  constants  in  terms  of  the  moments,  noting  that  the 
curve  starts  from  a;  =  — a  on  one  side  and  goes  off  to  infinity  on  the 
other,  we  have 


N-=|    ydx 

J  -a 

=2/oj    e-^-(l+-l   dx 


^Vo  f  e-y%a-\-xfdx  (where  ya=p) 

=  f-,re-y-(ya+yxrdx 

=^6^1   e'^y''+y''\ya+yxydx 

V  eP  /°^ 
=^o_      t-'z^dz  (where  ya^yx=z) 

yp^  k 


Therefore,  y,=Np''+'/ae''r(p+l)     . 


(10) 
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Again,  the  nth.  moment  of  the  distribution  referred  to  (—a,  0) 
as  origin  is 

Nja'„=l    y{a'\-x)*^dx 

J  -a 


J -a 

vo 


.2/0       e^ 


Therefore,  by  (10), 

Hence, 

i^'i=rtp+2)/yr(i>+l)  =  (^+l)/y 
i^'2=r(^+3)/y2r(^+l)  =  (^+2)(^+l)/y2 

i^'3=r(i)+4)/y3r(p+l)=(i>4-3)(i9+2)(p+l)/y^ 

Transferring  to  the  mean  as  origin  we  have  for  the  moments,  since 

fJi3=H''  3—Sxfji^—x^=2(p-{-l)/y^. 
Hence,  combining  these  last  two  equations, 

y=V2//^3.      v=(^f^Mfi\)-i    .       .       .    (11) 

In  our  particular  case  these  equations  give 

y=0-700780,    :p=l-30820,     a=l-86678, 

and,  therefore,  by  (10), 

2/0=55-3323. 

Hence  the  curve  is 


y=55-3e-°"'ni+^^ 
\      1-87 

The  equation  of  the  curve,  on  taking  logs,  gives 
log  y=log  yQ—y  log  io«  .  x+p  log  1 1+: 

=l-742979-0-304345a;+ 1-30820  log  (l+a:/l-86678). 
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Before  we  can  go  on  to  calculate  the  ordinates  of  the  curve  we 
need  to  know  where  the  origin  lies,  and  since  it  coincides  with  the 
mode  it  may  be  found  from 

mean-mode  =yJ  ^—a 

=(p+l)y-p/y 

='-^ (12) 

Thus,  mode=7-564316-2-853960=4-7I036. 


Mode         Mean 


Suppose  now  we  wish  to  calculate  the  ordinate  corresponding  to 
the  X  of  the  centre  point  of  group  (6—8),  we  have 

a;7=J(7-4-71036) 
=114482, 

bearing  in  mind  that  the  unit  is  a  rate  of  2  per  1000. 
Hence,  substituting  this  value  in  the  equation  for  log  y, 

log  2/7=1-666278 
2/7-46-374, 

and  similarly  any  other  y  may  be  found. 
The  curve  starts  at 

mode-a=4-71036-2(l-86678)=0-97680, 

so  that  the  range  of  the  first  group  as  determined  from  the  curve  is 
(0-9768—2),  and  not  (0—2)  as  in  the  observations. 

The  ordinates  and  afterwards  the  areas,  calculated  by  a  method 
somewhat  similar  to  that  indicated  in  Example  (2),  were  determined 
for  each  separate  group  of  observations,  and  the  results  for  both 
Type  I.  and  Type  III.  curves  are  compared  in  Table  (45). 

Type  III.  curve  is  drawn  on  the  same  diagram,  fig.  (37),  as  Type  I. 
curve  and  the  observation  histogram,  and  the  result  lends  emphasis 
to  an  important  point,  namely,  the  necessity  for  replacing  ordinates 
by  areas  to  obtain  the  frequency  proper  to  any  group. 

In  order  to  get  a  measure  of  the  goodness  of  fit  in  each  case, 
the  function  P  was  calculated,  but  in  the  Type  I.  comparison  the 
first  group  had  to  be  omitted  to  avoid  the  infinite  term  which  would 
have  resulted  in  ^^^^  owing  to  this  group  falling  right  outside  the 
curve,  that  is  to  say,  the  test  had  to  be  confined  to  towns  in  which 
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the  observed  case  rate  was  not  less  than  2.  The  values  found  for 
P  were  : — 

Type  I.— P=0-34307, 
Type  III.— P=0-46298, 

so  that  in  every  100  samples  containing  241  observations  each,  we 
should  get,  roughly,  34  deviating  from  the  Type*  I.  curve  and  46 
deviating  from  the  Type  III.  curve,  at  least  as  widely  as  the  given 
distribution.  In  neither  case  can  the  fit  be  regarded  as  a  very 
good  one,  but  the  failure  is  only  marked  in  one  or  two  groups,  such 
as  that  of  maximum  frequency,  where  there  may  be  other  than 
random  causes  to  account  for  it ;  e.g.  where  isolation  is  inefficient 
the  disease  is  likely  to  spread,  one  case  infects  another  :  in  other 
words,  the  events  are  not  independent. 


Table  (45).  Comparison  of  Observed  Distribution  of  In- 
fectious Disease  Kates,  notified  in  241  large  Towns  of 
England  and  Wales,  with  Theoretical  Distribution. 

(1)  (2)  (3)  (4)  (5)  (6) 


Observed 
Frequency. 

Theoretical  Frequency. 

Case  Rate. 

(/i-/)V/i. 

{fz-mu 

Type  I. 

Type  in. 

(/) 

(/i) 

(/a) 

0— 

5 

6-6 

, , 

0-39 

2 

39 

52-6 

43-7 

3-52 

0-51 

4 

69 

55-4 

54-3 

3-34 

3-98 

6— 

41 

43-2 

46-2 

Oil 

0-59 

8— 

29' 

31-2 

33-6 

015 

0-63 

10— 

22 

21-5 

22-4 

0-01 

001 

12— 

16 

14-2 

141 

0-23 

0-26 

14— 

7 

91 

8-6 

0-48 

0-30 

16— 

5 

5-6 

51 

0-06 

0-00 

18— 

3 

3-3 

2-9 

0-03 

000 

20— 

4 

1-9 

1-7 

2-32 

3-11 

22— 

0 

10 

0-9 

100 

0-90 

24 

0 

0-5 

0-5 

0-50 

0-50 

26— 

1 

0-3 

0-3 

1-63 

1-63 

•• 

241 

239-8 

240-9 

X\  =  13-38 

X^3=  12-81 

Example  (4)  refers  to  the  wages  of  certain  women  tailors  previ- 
ously recorded  in  Table  (II),  p.  41.  The  data  as  given  in  the 
original  suffered  a  disadvantage  common  to  such  statistics  :    at 
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either  end  the  grouping  differed  from  that  in  the  centre,  two  or  three 
classes  being  lumped  together  owing  to  the  smallness  of  frequency 
in  each.  The  figures  ran  thus  : — Under  5s.,  19  ;  5s.  and  under  6s., 
180  ;  6s.  and  under  7s.,  384  ;  ...  ;  23s.  and  under  24s.,  64 ; 
24s.  and  under  25s.,  54  ;  25s.  and  under  30s.,  122";  30s.  and  over, 
36.  They  were  recast  in  the  form  shown  in  Table  (46),  suggested 
by  an  examination  of  the  histogram,  in  order  to  make  the  fitting 
simpler. 

The  first  four  moments  calculated  from  this  adapted  table  and 
referred  to  12s.  as  origin  are  : — 

z/'i=0-556718,  i;'2=5-056373,  i;'3=16-70163,  i;'4=123-7691. 
When  referred  to  the  mean,  13-113436,  the  last  three  become 

1/2=4-746438,     1^3=8-60179,     i/4=95-6914 ; 
or,  after  making  Sheppard's  adjustments, 

/X2=4-663105,    />t3=8-60179,    /x^ =93-3474 ; 
therefore,  ft  =0-7297 13,     ^=4-29291,      a:=1-63. 
The  curve  is  thus  of  Type  VI., 

y=yo(x-a)'^/x'^i. 

To  calculate  the  constants,  the  nth  moment  about  the  origin  is 
given  by 

NjLt'„=l    yx'^dx 

Ja 

=yJ'^(x—af^x''-^^dx 

Ja     , 

-Vol  <^"^-^  •  ,-;^«(-~2J^^(^here  ^=-j 

fVi-'^2-n-2n_2y2e^2 

n-ljo 


2/0 


-Biqi-q^-n-l,  g^+l). 


Thus,  putting  n=0, 

qQI-'12-1 

and  fJi'n=a^r{q^-q2-'^-n)r{q,)ir{q^-n)r{qi-q>,-l) ; 

therefore,     ix\=ar{qi-q2-^)T{qi)ir{qi-l)r(qi-qi-l) 

=«(?!- l)/(9'i-9'2-2)- 
Also   iJL\l^\.i=ar(q,-q2-'^--n)r{q,-n+l)ir{qi-n)r{qi-q2-n) 

=a{qi-n)l{q^-q2-n-l). 
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Hence  fJ^' 2=a^qj,-l){q^-2)l{qi-q^-2)(qj^-q.^-3) 

/^'3=«'to-l)fe-2)fe-3)/tei-^2-2)fe-g2-3)(gi-?2-4) 

But  these  relations  are  precisely  the  same  as  those  of  Type  I.  with  a 
in  place  of  b,  —q^  in  place  of  m^,  and  q^  in  place  of  mg,  so  that 
(l-fQa),  (1— Qi)*  are  the  roots  of 

q2_rq_|_4r2(r^l)yf-^^(r_|_2)2_^16(r+l)]=0     .        -.  (14) 

where  r=603,-j8,-l)/(6+3ft-2ft)    ....  (15) 

Also  yo=Na'^i-'^2-T(qi)/r(ai-a2-l)r(q2+l),  by  (13)  .  (16) 

and  a  is  given  by 

/.,=a=^(l-aJ(H-a2)/r2(r+l),        ....  (17) 

/X2  being  the  second  moment  of  the  given  distribution  referred  to 
its  mean  as  origin. 

The  distance  of  the  mean  from  the  origin  is 

/^'i=a(ai-i)/(qi-a2-2), 

and  this  fixes  the  origin,  for  the  mean  is  known  directly  from  the 
statistics. 

To  get  the  mode,  use  the  equation  of  the  curve,  putting  —  =0, 

dx 

and  we  have 

origin =mode — ag'i/(g'i — g'g)  • 

Combining  this  with 

origin  =mean— a(gi— 1  )/(g'i— ^2— 2) 
we  have 

mean.mode=a(ai+a2)/(ai-a2)(qi-q2-2)    •         •     (18) 
Applying  these  formulae  to  the  case  of  the  women  tailors, 
r=-38-7698,     ^1=51-5269,     ^2-10-7571,     a=2M1018, 

and  the  equation  of  the  curve  is 

y=yo(x-21-l)^»Vx"^ 

where  log  2/0 =68 -8254. 

Also  the  origin  is  at  —41-9104,  the  mode  at  11-4498,  and  the  maxi- 
mum theoretical  frequency  is  2299. 

[*  When  Ms  is  positive  (1  +5^2)  goes  with  the  positive  root  of  the  quadratic,  and 
vice  versa.  ] 
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Table  (46).     Distribution  of  Wages  of  cebtain 
Women  Tailors,  Actual  and  Theoretical. 


Wages. 

Frequency. 

Wages. 

Frequency. 

Actual. 

Theoretical. 

Actual.         Theoretical. 

Is.— 

3s.— 

6s.— 

7s.— 

9s.— 

lis.— 

13s.— 

16s.— 

17s.— 

5 

14 

564 

1243 

2045 

2339 

1815 

1432 

854 

1 

52 

462       1 

1332       j 

2096 

2255 

1898 

1353 

859 

19s.— 
21s.— 
23s.— 
25s.— 

■    27s.— 
29s.— 
31s.— 
33s.— 

523 

262 

118 

64 

43 

27 

15 

9 

503 

278 

147 

75 

38 

19 

9 

5 

•• 

•• 

.. 

•• 

11,372       i    11,372 

i 

The  theoretical  and  actual  frequencies  are  compared  in  Table  (46) 
and  the  curve  is  drawn  with  the  histogram  in  fig.  (38). 
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Example  (5)  discusses  the  distribution  of  frequencies  of  specimens 
of  Anemone  nemorosa  with  different  numbers  of  sepals,  recorded  by 
G.  U.  Yule  {Bicmetrika,  vol.  i.,  p.  307). 


Wn=j; 
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The  first  four  moments  referred  to  6  as  origin  are 

^^=0-508,     i.'2=l-012,     i/'3=2-476,     i,'4=9-124. 
Referred  to  the  mean,  6-508,  the  last  three  become 

^2=0-7539360,     z;3=M95905,     1/4=5-459941. 

The  contact,  at  one  extremity  certainly,  being  doubtful,  Sheppard's 
adjustments  were  not  made  in  this  case.     Hence, 

j8i=3-337259,    j32=9-605476,     /c=l-46. 

Since  k  does  not  differ  greatly  from  unity  an  attempt  was  made  to 
fit  the  observations  with  a  Type.  V.  curve,  namely, 
y=yoX-Pe-^^ 
The  wth  moment  about  the  origin  is  given  by 

yx^dx 

(since,  p  and  y  being  positive,  y  vanishes  at  x—0  and  at  a;=oo) 

=yQy''-P+^rzP-''-h-'dz  (where  z=y/x) 

=y^y--P+^r{p-n-l). 
Thus  N=2/oy-^+'r(2)-l). 

And  ^' Jii\-x=yl{p-n-\). 
Hence         />t'\=y/(^— 2) 

/^'2=y7(2>-2)Cp-3) 
/x'3=yV(p-2)(2>-3)(2?-4). 

Referred  to  the  mean  as  origin,  the  last  two  moments  become 

/^2=yV(f'-2)^(p-3), 

/^3=V/(i'-2)'{l>-3)(i'-4), 
whence 

this  gives  a  quadratic  for  (i?— 4),  one  solution  of  which  is 

p_4=[8+4V(4+i3.)]/i3.,       .         .         .     (19) 
the  positive  root  being  taken  in  order  to  get  a  real  y. 

Thus  y*-(P-2)V[(P-3)/^J         .         .         .     (20) 

and  y,=Ny^-Vr(p-l)      ....     (21) 

Since  /^'i=y/(2'-~2),  the  position  of  the  origin  is  given  by 

Origin=Mean— y/(p— 2)  .         .         .     (22) 

Also  the  distance  of  the  mode  from  the  origin  is  y/p,  so  that  all 
the  constants  of  the  curve  are  readily  determined. 

[*  The  sign  of  7  is  taken  to  be  the  same  as  that  of  /Xg.] 
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In  our  particular  case,  we  get 

^=9-643840,     y=1710768, 
and  the  curve  is 


y=yoX-'-e--^/^ 
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10 


12 


where  log  2/0 —9-38179.  The  origin  is  at  4-27  and  the  mode  at  6-04. 
The  greatest  frequency  is  620  approximately,  and  the  frequency  dis- 
tribution, calculating  areas  for  the  several  groups  as  if  they  ranged 
between  (4-5— 5-5),  (5-5— 6-5),  etc.,  is  shown  alongside  the  observed 
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distribution  in  Table  (47).  The  curve  is  plotted  in  fig.  (39)  from  the 
ordinates  which  were  calculated  at  the  centre  and  extremities  of 
each  group  so  as  to  enable  Simpson's  simple  quadrature  formula 
to  be  used  to  get  the  areas. 


Table  (47).  Distribution  of  Sepals  of  Anemone 
Nemorosa,  observed  and  calculated. 


No.  of 
Sepals. 

Frequency. 

No.  of 
Sepals. 

Frequency. 

Observed. 

Calculated. 

Observed.     |   Calculated. 

5 
6 

7 
8 

34 
576 
276 

92 

51 
544 
296 

81 

9    ' 
10 
11 
12 

1 

1 

14                   22 

4                     G 

2 

4                     1 

•• 

.. 

.. 

1000         ,      1003 

[Examples  have  been  given  above  of  five  out  of  the  seven  different  types 
of  frequency  curve  that  have  been  enumerated.  For  further  examples  of 
all  the  types  and  a  complete  account  of  the  method  reference  should  be 
made  to  Professor  Pearson's  memoirs,  especially  the  following  : — 

Roy.  Soc.  Phil  Trans.,  vol.  186a,  pp.  343-414  (1895),  On  Skew  Variation 
in  Homogeneous  Material ;  and  a  Supplementary  Memoir  in  vol.  197a,  pp.  443- 
459  (1901). 

Biometrika,  vol.  i.,  pp.  265  et  seq.,  On  the  Systematic  Fitting  of  Curves  to 
Observations  and  Measurements,  continued  in  vol.  ii.,  pp.  1-23.  Also  vol.  iv., 
pp.  169-212,  which  discusses  various  historical  hypotheses  made  to  generaUze 
the  Gaussian  Law,  the  basis  of  the  symmetrical  normal  curve. 

A  large  number  of  highly  interesting  practical  illustrations  of  Pearsonian 
curve  fitting  occur  throughout  the  pages  of  Biometrika,  while  W.  P.  Elderton's 
Frequency  Curves  and  Correlation  contains  an  admirably  concise  treatment  of 
the  theory,  with  applications  to  meet  more  particularly  the  actuarial  point 
of  view. 

It  should  be  stated  that  rival  curves  and  methods  have  been  proposed  as 
suitable  for  fitting  certain  types  of  frequency  distribution,  some  of  which  have 
scarcely  received  the  attention  and  the  trial  they  deserve.  Among  the  most 
interesting  are  those  developed  by  Professor  Edgeworth ;  for  some  account  of 
his  voluminous  work  upon  the  subject  the  reader  may  refer  to  several  memoirs 
in  the  Journal  of  the  Royal  Statistical  Society,  beginning  December  1898 
(the  Method  of  Translation),  among  which  the  following  are  important  as 
giving  more  recent  results  of  his  researches  : — 

Vol.  Ixix.  (1906),  The  Generalized  Law  of  Error  or  Law  of  Great  Numbers. 

Vol.  Ixxvii.  (1914),  On  the  Use  of  Analytical  Geometry  to  Represent  Certain 
Kinds  of  Statistics. 

Vol.  Ixxix.  (1916),  On  the  Mathematical  Representations  of  Statistical  Data; 
continued  in  vol.  Ixxx.  (1917). 

Two  memoirs  may  be  cited  as  of  particular  interest — those  of  May  1917 
and  March  1918 — because  they  reply  to  criticism  and  draw  a  comparison  from 
their  author's  point  of  view  between  his  curves  and  those  of  Professor  Pearson.] 


CHAPTER    XVIII 

THE  NORMAL  CURVE  OF  ERROR 

Let  us  return  for  a  moment  to  the  general  statement  on  p.  143, 
that  '  whenever  we  have  n  similar  but  independent  events  happen- 
ing in  which  the  probability  of  success  for  each  is  jp,  the  different 
resulting  possibilities  as  to  success  are  given  by  the  successive 
terms  in  (s-f/)",  namely, 

and  their  correspondent  probabilities  by  the  successive  terms  in 
0)+^)",  namely, 

When  we  come  to  try  and  apply  this  theory  directly  to  cases 
other  than  those  of  random  sampling  in  artificial  experiments  with 
coins,  dice,  etc.,  we  are  faced  at  once  with  difficulties  because  of 
the  limiting  character  of  the  assumption  on  which  the  theory  rests, 
namely,  that  all  the  events  are  to  he  similar  and  independent.  The 
similarity  demanded  is  of  the  same  radical  type  as  that  existing 
when  we  throw  the  same  die  or  spin  the  same  coin  twice  running, 
and  the  test  for  it  is  that  p,  the  chance  of  success,  is  to  be  the  same 
for  every  individual  event.  The  independence  is  to  be  such  that 
no  single  event  and  no  combination  of  events  is  to  have  any  influence 
upon  any  of  the  rest. 

Now  for  most  classes  of  events  it  is  impossible  to  assign  any 
a  priori  value  to  p  at  all,  still  less  can  we  be  sure  that  p  does  not 
change  from  one  event  to  the  next.  For  example,  the  chance  of 
death  for  soldiers  in  war-time  varies  from  regiment  to  regiment 
according  to  where  they  happen  to  be  located  ;  for  the  same  regi- 
ment it  varies  from  battaUon  to  battaUon  according  to  whether 
they  are  in  the  trenches  or  behind  the  lines  ;  and  from  individual 
to  individual  according  to  innumerable  little  accidents  of  time,  place, 
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and  condition.  Also,  where  the  shells  burst  thickest,  p  increases 
for  any  soldier  there,  but  it  increases  also  for  his  neighbour.  Thus 
the  events  in  such  a  case  are  not  similar,  neither  are  they  inde- 
pendent. 

Moreover,  as  it  stands,  the  theory  cannot  be  appUed  to  any 
distribution  in  which  the  character  observed  is  capable  of  continu- 
oiLS  variation.  This  difficulty,  however,  has  been  overcome,  as  we 
have  seen,  by  replacing  the  histogram  representative  of  the  binomial 
by  a  continuous  curve  which  at  the  same  time  serves  to  describe 
the  discontinuous  series  to  a  high  degree  of  accuracy. 

To  illustrate  how  close  this 
description  can  be,  even  when  n 
is  comparatively  small,  we  will 
fit  with  its  appropriate  normal 
curve  the  symmetrical  binomial 
polygon  formed  by  joining  up 
the  summits  of  the  ordinates 
representing  successive  terms  of 
the  series 
— ^  2io(HJ)i«, 


A 


A 


A 


K 


N 


\ 


erected  at  unit  distance  apart. 
The  total  area  bounded  by  the  polygon,  the  extreme  ordinates, 
and  the  axis  of  x  is  practically 

=  (2/0+2/1+^2+    •  •  •  +2/'i+2/'2+   •  •  Oxil) 
=sum  of  toe  given  ordinates 

=1024. 

The  equation  of  the  normal  curve  is 


where 
and 


Yo=N/V2^  •  (7=1024/V(5-57r). 


1    % 


Hence,  taking  logs,  we  have 

logi/=log  Yo-— -logioe 

=2-39I5437-x2(0-0789626). 

It  is  easy  from  this  equation  to  calculate  the  normal  curve  ordinates 
corresponding  to  x=0,  1,  2,  3,  4,  5,  and  the  results,  compared  with 
the  polygon  ordinates,  are  as  follows  : — 
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X 

Ordinate  of  Polygon. 

Normal  Curve  Ordinate. 

(J  ,  r^  -*,..,  „ 

0 

262 

246-3        <^*«^ 

±1 

210 

205-4'       -^  ' '   ^' 

±2 

120 

119-0        /  m;    ' 

±3 

46 

480          *""  ^ 

±4 

10 

13-4          ^^^' 

±5 

1 

2-6            /   7 

Now  although  the  circumstances  in  which  the  series 

may  be  taken  to  represent  the  frequency  distribution  resulting 
from  a  particular  kind  of  experiment  were  so  stringently  defined, 
there  is  no  reason  why  the  normal  curve  itself  to  which  the  theory 
led  should  be  subjected  to  precisely  the  same  limitations.  After 
all,  the  real  and  only  justification  for  choosing  one  curve  rather 
than  another  to  fit  any  given  observations  is  that  it  does  succeed 
in  fitting  them  better.  But  when  the  further  question  is  asked 
why  the  normal  curve  should  succeed  in  describing  some  results 
so  well,  we  must  not  be  tempted  by  analogy  to  rush  to  the  con- 
clusion that  the  causes  at  work  are  necessarily  independent,  and 
equal,  and  so  on.  In  short,  the  theoretical  justification  and  the 
empirical  use  of  the  normal  curve  are  two  quite  different  matters. 

Experience  shows  that  the  normal  curve  suffices  to  fit  certain 
types  of  distribijtion,  besides  those  which  arise  in  tossing  coins  and 
in  similar  experiments,  with  remarkable  accuracy  ;  among  these 
may  be  noted  : — 

1.  Certain  biological  statistics ;  for  instance,  the  proportions  of 
male  to  female  births  taken  over  a  series  of  years  for  a  large  com- 
munity such  as  the  population  of  a  country ;  also  the  propor- 
tions of  different  types  of  plants  and  animals  resulting  from  cross- 
fertilization. 

2.  Certain  anthropometrical,  particularly  craniometrical  and  allied 
statistics^  such  as  the  height,  weight,  lengths  of  various  bones,  skull 
measurements,  etc.,  of  a  large  group  of  persons,  and  the  agreement 
is  the  closer  if  the  group  be  reasonably  homogeneous,  i.e.  composed 
of  individuals  of  the  same  nationality  and  sex  between  the  same 
narrow  age  limits,  etc.  ;  also  measurements  of  a  similar  character 
in  animals  and  plants. 

3.  Errors    of  observation   in  experimental  work ;     for  example, 


234  STATISTICS 

several  measurements  of  the  same  quantity — length,  weight,  speed, 
temperature,  or  whatever  it  be — will  contain  errors  of  this  kind 
which  are  equally  liable  to  be  above  or  below  the  true  value. 

4.  The  marks  of  shots  upon  a  given  target,  assuming  that  the 
shots  are  equally  liable  to  err  in  any  given  direction.  This  is  an 
interesting  case  of  the  normal  law  in  two  dimensions,  for  the  north 
and  south  line  and  the  east  and  west  line  through  the  centre  of 
the  target  may  both  be  regarded  as  axes  of  normal  curves  of  error.* 

5.  Certain  sociological  statistics  of  a  comparatively  stationary  char- 
acter ;  for  example,  rates  of  birth,  marriage,  or  death  at  neighbour- 
ing times  or  like  places  ;  also  the  wages  (and  possibly  the  output 
if  it  could  be  satisfactorily  measured)  of  large  numbers  of  workers 
engaged  in  the  same  occupation  under  the  same  general  conditions. 

6.  Any  statistics  or  quantities  that  are  individually  compounded  of 
a  large  number  of  elements,  mostly  independent  of  one  another,  which 
themselves  vary  between  limits  not  very  widely  divergent,  and  none 
of  which  exert  a  preponderating  influence  upon  their  resultant 
statistic.  The  latter  may  be  simply  the  sum  of  its  elements,  or, 
more  generally,  it  may  be  any  function  of  the  elements  which,  to 
the  first  degree  of  approximation,  can  be  expressed  in  linear  form. 

Now  it  would  be  a  difficult  matter  in  most  of  these  cases  to  satisfy 
ourselves  as  to  the  fulfilment  or  non-fulfilment  of  conditions  like 
those  on  which  the  binomial  distribution  rests.  It  is  not  easy 
indeed  to  visuaHze  them  perfectly,  except  in  artificial  experiments 
where  they  are  largely  under  control.  If  anything,  the  chances 
seem  almost  hopelessly  against  their  fulfilment  in  ordinary  life, 
so  closely  must  we  hedge  round  our  sample  to  keep  out  unequal 
influences.  For  example,  to  use  a  frequently  quoted  illustration, 
if  p  measures  the  chance  of  death  for  an  individual,  the  death  rate 
varies,  as  we  know,  considerably  from  place  to  place  according  to 
the  age  and  sex  constitution  of  the  population  ;  it  is  influenced  by 
differences  in  class,  and  occupation,  and  manner  of  life  ;  it  is 
altered  from  time  to  time,  violently  by  the  ravages  of  war  or  disease, 
more  gradually  by  improvement  in  general  sanitation,  housing 
conditions,  etc.  We  should  only  expect  to  get  the  binomial  distri- 
bution (and  consequently  the  normal  law  if  it  depended  upon  the 

[*  Sir  John  Herschel  published  in  the  Edinburgh  Review  (1850)  an  a  priori 
proof  of  the  normal  law  from  a  consideration  of  this  problem.  Taking  <t>{.x'^)  as 
the  expression  of  the  law  for  one  dimension  and  <t>{x^  +  y^)  for  two  dimensions, 
the  independence  of  errors  in  perpendicular  directions  leads  to  the  functional 
equation    <p{x^-\-y^)='(p{x'^)>^(p{y^),    the     solution     of     which     is    of     the    form 

0(x-)  =  — p  c  "  ^^^.  It  should  be  added  that  the  assumptions  underlying  the  proof 
are  not  entirely  above  criticism.  ] 
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same  postulates)  exactly  verified  if  we  were  dealing  with  the  same 
stationary  population  existing  under  the  same  stable  conditions 
over  a  long  period  of  time  ;  moreover,  since  jp  is  to  be  identical  for 
each  individual  event  in  the  ideal  case,  it  would  be  further  necessary 
that  every  family  and  every  individual  in  our  population  should 
also  remain  in  the  same  stationary  and  stable  state.  This  is  mani- 
festly impossible,  especially  after  the  industrial  revolution  which 
the  advent  of  machine  power  created. 

These  considerations  suggest  the  interesting  question  whether  the 
various  types  of  statistics  we  have  enumerated,  as  being  approxi- 
mately subject  to  the  normal  law,  could  not,  if  we  knew  more 
about  them,  really  all  be  included  under  heading  number  (6),  repre- 
senting a  further  development  from  the  binomial  theory  and  an 
enlargement  of  the  field  in  which  it  holds  good. 

In  an  earlier  chapter,  when  we  were  discussing  the  connection 
between  marriage  rate  and  prices,  we  showed  how  it  was  possible 
by  a  method  of  averaging  to  differentiate  between  long-time  and 
short- time  effects.  The  more  transient  fluctuations,  only  super- 
ficial in  character,  were  removed  and  the  real  nature  of  any  per- 
manent change  in  the  figures  was  revealed.  In  much  the  same 
way,  when  we  have  a  group  of  statistics  which  do  not  perhaps  fit 
a  normal  curve  of  error  at  all  closely,  it  may  be  possible  by  random 
averaging  to  get  rid  of  some  of  the  fluctuations  which  cause  the 
badness  of  fit  and  to  obtain  a  new  group  of  statistics  which  more 
nearly  obey  the  normal  law.  Averaging,  that  is  to  say,  tends 
to  smooth  away  the  rough  outstanding  abnormalities  ;  and  we  shall 
presently  show  that  if  two  variables,  X^,  Xg,  which  are  independent, 
obey  the  normal  law,  any  linear  function  of  the  variables 
{w-^-y^-\-w^,^,  obeys  the  same  law.  This  may  throw  some  light 
on  Class  (6)  where  each  statistic  represents  a  compound,  that  is, 
in  a  broad  sense,  a  kind  of  an  average  of  a  large  number  of  elements 
which  partially  neutralize  one  another's  infiuence,  or  rub  the  corners 
off  one  another,  so  to  speak,  since  no  single  element  is,  by  hypothesis, 
to  exert  an  overwhelming  infiuence  upon  the  compound  itself. 

But  although  the  normal  curve  does  serve  to  describe  a  consider- 
able number  of  frequency  distributions  within  reasonable  limits, 
there  are  many  more  cases  in  which  it  fails  :  for  example,  the 
greater  part  of  those  bearing  on  economic  matters ;  also  statistics 
relating  to  the  incidence  of  disease  and  degree  of  fertility  are,  as 
a  rule,  very  markedly  skew.  Hence  arose  the  necessity  for  an 
extension  from  the  symmetrical  normal  to  some  kind  of  skew 
variation  curves  to  fit  such  distributions. 
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The  normal  curve,  however,  has  an  importance  of  its  own  to 
which  we  must  now  draw  special  attention.  It  is  the  foundation 
of  the  theory  of  errors  and  provides  us  with  an  invaluable  method 
of  estimating  the  importance  of  one  error  in  comparison  with 
another,  or  of  determining  the  probability  that  an  error  shall  lie 
between  stated  limits.  Upon  it  we  depend  for  several  most 
important  approximations  which  are  in  constant  use. 

The  term  '  error  '  is  used  here  in  the  sense  that  if  we  take  the 
mean  of  a  number  of  observations,  the  deviation  of  any  one  of 
them  from  the  mean  may  be  termed  its  error.  When  such  devia- 
tions can  be  satisfactorily  fitted,  that  is,  within  the  limits  of  random 
sampling,  by  means  of  a  normal  curve,  they  are  said  to  be  subject 

to  the  normal  law  of  error. 
This  law  is  expressed,  as  we 
have  seen,  by  the  equation 

a 

where  y  .  Sx  measures  the  fre- 
quency with  which  an  observed 
organ  or  character  deviates  from 
the  mean  by  an  amount  lying  between  x  and  (ic-f-Sx)  in  a  large 
population,  i.e.  y  .  hx  registers  the  frequency  of  an  error  of  size  x 
to  (x-\-hx),  and  N  and  a  are  constants  dependent  upon  the  particular 
application  of  the  law. 

The  'probability  curve  or  normal  curve  of  error.  As  a  guide  to  the 
drawing  of  the  above  curve  it  may  be  worth  while  plotting 

y=e-^. 
This  is  readily  done  by  writing  the  equation  in  the  form 

—x^=\og^y. 
Giving  now  to  y  the  values  0,  0-1,  0-2,  etc.,  we  can  find  values  of 
loge  y  a-s  shown  in  Table  (48),  and,  by  means  of  a  square  root  table, 
X  is  then  determined. 

Table  (48).  Corresponding  Values  of  x  and  y  to  plot  y=e-^\ 


N 

2/  =  — =- 

V2^ 


y 

logey 

X 

±00 

V 

logey 

-0-5108 

X 

±0-71 

0 

—  00 

0-6 

01 

-2-3026 

±1-52 

0-7 

-0-3567 

±0-60 

0-2 

-1-6096 

±1-27    1 

0-8 

-0-2232 

±0-47 

0-3 

-1-2040 

±M0   : 

0-9 

-0-1054 

±0-32 

0-4 

-0-9163 

±0-96 

10 

0 

0 

0-5 

-0-6932 

±0-83 
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This  enables  us  to  plot  the  graph  as  shown  in  fig.  (40).  Since 
logg  1=0,  and  the  logarithm  of  any  number  greater  than  1  is 
positive  and  thus  cannot  be  equal  to  —x^,  it  follows  that  y  cannot  be 
greater  than  1 .  Moreover  y  cannot  be  less  than  0,  for  the  logarithm 
of  a  negative  quantity  is  meaningless,  but,  as  y  approaches  0, 
X  approaches  cxD. 

Also  the  curve  is  symmetrical  about  OY  because  for  any  possible 
value  of  y  there  are  two  values  of  x,  equal  and  opposite. 

Returning  now  to  the  curve 

y — 7= —  ^         > 

V27T  .  (7 

it  must  be  of  the  same  general  shape  as  y^er^^  because  the  two 
only   differ  in   their   constants.      It    is   clearly   symmetrical,   for 


-200 -1-75  -1-50  -1-25 -100 -0-75-0-50 -0-25     0      0-25   0-50  0-75    100    1-25  1-50  1-75   200 
Fio.  (40).  The  graph  of  ?/=c-*'. 


instance,  about  the  axis  of  y,  because,  in  this  case  also,  to  any  value 
of  y  there  are  two  values  of  x  equal  and  opposite.  Moreover  it 
tails  off  to  the  right  and  left  from  OY,  the  axis  of  x  being  an 
asymptote,. for  as  x  tends  to  ioo?  V  tends  to  zero  as  before. 


When 


a;=0,     2/=N/\/27r  .  cr, 


giving  the  point  B,  fig.  (41),  where  the  curve  cuts  the  axis  of  y. 
This  is  evidently  the  highest  point  on  the  curve,  for 


dy 


Na: 


■a;2/2<r2 


dx         V27r  .  a^ 
and  this  vanishes  when  a;=0. 

d^y         N 


Again,  

d^^     V27r(T» 

which  vanishes  when  iC=ior,  and  at  these  two  points,  H,  H',  we 
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therefore  have  '  points  of  inflexion  '  where  the  bend  of  the  curve 
changes  its  direction. 

The  axis  of  y  about  which  there  is  symmetry  evidently  locates 
the  mean  error,  in  this  case  zero  ;  in  fact  the  mean  and  mode 
coincide,  so  that  the  mean  or  zero  error  is  also  the  one  which  most 
frequently  occurs,  and  any  two  other  errors  which  are  equal  in 
magnitude  but  above  and  below  the  mean  respectively  occur  with 
equal  frequency  :  i.e.  the  frequency  of  positive  errors  is  balanced 
by  the  equal  frequency  of  negative  errors  on  the  other  side  of  the 
mean,  making  the  median  error  likewise  zero. 

Again,  the  area  /      ydx  measures  the  frequency  of  errors  lying 

r  +  X 

between  x^  and  X2  above  the  mean  ;    I     ydx  registers  the  frequency 


Fio.  (41). 


of  errors  between  0  and  x,  or  of  deviations  up  to  this  magnitude, 
on  either  side  of  the  mean ;  and,  in  particular,  for  all  errors 

the  total  frequency  =  I      ydx 


V2. 


N       r^ 
277  .  aJ-^ 


^'I'-'dx 


N 


V27r.c7 


(V27r  .  a)  (as  on  p.  206) 


This  enables  us,  by  means  of  the  fundamental  definition,  at  once 
to  write  down  the  probability  of  errors  between  any  stated  limits 
and  explains  the  origin  of  the  name,  the  probability  curve,  which 
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is  sometimes  given  to  the  equation.      Thus  we  have  the  probability 
of  an  error  between  -\-Xi  and  +^2 

_frequency  of  errors  between  the  given  limits 
frequency  of  all  errors 

=  /   ydx/i      ydx  • 


(1) 


Incidentally,    the     probability     of    an    error    between    x    and 

N 
8x 


(x+Sx) 


x2/2(7-2 


VS-; 


(2) 


Fig.  (42). 

Greometrically,  the  area  represented  by  the  shaded  portion  of 

fig.  (42)  measures  the  frequency  of  errors  between  -{-x^  and  +a;j, 

while  the  complete  area  between  the  curve  and  axis  X'OX  measures 

the  total  frequency,  so  that  the  probability  of  an  error  between 

-\-Xi  and  -\-X2  is  measured  by  the  proportion  which  the  area  of  the 

shaded  portion  bears  to  the  whole  area. 

dx 
If  in  the  above  expression  (1)  we  put  x/a=$,  so  that  — =ct, 

d^ 

it  becomes 


V27r4 


(3) 


which  is  known  as  the  probability  integral,  ^^    and  ^2  being  the 
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values  of  f  which  correspond  to  the  values  x^  and  x^  of  x.    But 
this  integral  measures  the  area  of  the  shaded  portion  of  the  curve 

1 


y= 


■u' 


V27T 


(4) 


shown  in  fig.  (43),  which  is  really  the  normal  curve  over  again,  but 
drawn  on  a  different  scale,  namely,  with  the  ordinates  reduced  in 
the  ratio  N  :  a  and  with  the  standard  deviation  a  taken  as  the 
unitof  measurement  for  a;,  for  f=:  1,2,  3  .  .  .  whena:=cr,  2cr,  3a,  .  .  . 
This  has  the  effect  of  making  the  total  area  unity  and  the  area 
given  by 

1         r^2       .  .„ 

.     (3)  bis 


27rJh 


V2- 

now  directly  measures  the  probability  of  an  error  between  o-f  ^  and  a^^. 

Tables  have  been  prepared 
(see  pp.  284,  285)  which  enable 
us  to  write  down  the  value  of 
this  integral  for  different  values 
of  fi  and  ^2  between  certain 
limits  (see  Appendix,  Note  10). 
Let  us  take  an  example  to 
show  how  the  curve  may  be 
used,  and  we  choose  one  leading 
to  a  binomial  distribution,  so 
giving  an  expression  for  the 
probability  by  first  principles, 


Fig.  (43). 

in  order  to  compare  the  two  methods 


Example. — Suppose  we  toss  simultaneously  100  coins,  and  sup- 
pose the  chance  of  success,  say  '  heads,'  is  the  same  for  each  coin 
and  equal  to  1/2.     In  that  case,  according  to  the  binomial  theory, 

the  probabiUty  of  100  heads  =(l/2)ioo, 

„      99  heads  and  1  tail  =iooCi(  1/2)99 (1/2), 

„      98  heads  and  2  tails =iooC2(l/2)98(l/2)2,andso  on. 

The  most  probable  number  of  heads==7ip=(100)(l/2)=50.  This 
does  not  mean,  as  explained  before,  that  if  we  perform  the 
experiment  once  we  are  sure  on  that  one  occasion  to  get  exactly 
50  heads  and  50  tails,  but  that  if  we  go  on  repeating  the  experiment 
we  shall  in  the  long  run  get  50  heads  and  50  tails  turning  up  more 
often  than  any  other  combination. 

Let  it  be  required  to  find  the  probability  of  getting  at  least  65 


THE   NORMAL   CURVE    OF   ERROR  241 

heads,  that  is,  we  want  the  probability  of  getting  55  heads  or 
more,  and  this  is  given  by 

a  sum  not  very  readily  calculated  if  we  have  to  go  at  it  in  a  straight- 
forward manner. 

Now  let  us  turn  to  the  curve  of  error  method.  The  standard 
deviation  for  the  distribution  is  given  by 

Since  the  mean  number  of  heads  to  be  expected  if  the  experiment 
is  repeated  a  considerable  number  of  times  =50,  we  want  to  find 
the  probability  of  an  error  equal  to  or  greater  than  5,  i.e.  an  error 
lying  between  a  and  +CX),  because  a  =5. 

But  the  probability  of  an  error  between  cr^i  and  erf  g 

Hence  the  required  probability 

=0-15866,  by  the  probability  integral  tables. 

In  other  words,  if  we  repeated  the  experiment  100  times,  we  might 
expect  55  or  more  heads  about  16  times. 

We  can  now  show  that  if  X^,  Xg  are  two  uncorrelated  variables 
obeying  the  normal  law,  then  (w-^-^-\-w^2)  ^^'^^  ^^^V  ^^^  same  law. 

Suppose  x^,  X2  are  observed  deviations  from  the  mean  values 
Xi,  Xg  in  one  particular  record,  a^,  o-g  being  the  respective  S.D.'s. 

Let  X=i<;iXi+^2-^2'  ^^^  ^^^  ^  ^®  ^^®  deviation  in  X  corre- 
sponding to  deviations  x^,  x^  in  the  given  variables. 

Thus  X-\-x=w^{X,+Xi)-{-w^{X^+X2) 

=KXi+w;2X2)+ Kiri+w;2^2)- 
Therefore,  x=W;iXi-]-W2X2' 

But  the  same  error  x  may  be  obtained  by  giving  x^,  Xg  many  different 
values  provided  their  weighted  sum  is  unaltered.  Let  us  first 
keep  x^^  constant,  so  that  the  corresponding  value  of  X2  required 
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to  produce  an  error  lying  between  x  and  (x-\-Bx),  where  Bx  is  small, 
must  be  such  that 

X<WjXj^-\-W2X2<X-\-  SiC, 

i.e.  x—WiXi<W2X2<x—WiXj^-]-Sx, 

i.e.  X2  lies  between  (x—WiX-^fw^  and   (x—w-^x^-\-hx)lw2,  and  the 
probability  for  this 

Wi  '  V27r  .  (72 

Now  this  is  in  a  form  which  only  involves  8a;,  x,  and  a^^,  and  we 
get  the  total  probabiUty  for  an  error  lying  between  x  and  (x-^bx) 
by  giving  all  possible  values  to  the  error  Xj^. 

But  the  probability  for  x^  itself  to  lie  between  x^^  and  (iCi+^^i) 

.Xi+Sxi 

V27rar 


=     _     f         e-'^"'^'^dx 


0^1      ^-a;2j/2o-2 


e -^1/2.-1^  by  (2), 


V'27r(Ti 

and  the  probability  for  this  to.  concur  with  a  suitable  a^g  to  produce 
an  error  in  the  weighted  sum  lying  between  x  and  (x-^Sx),  on  the 
assumption  that  X^  and  X2  are  independent,  is  therefore 

'Bx        1 


L(JiV27T 


_^2     G2V27T 

^  _  x^i  _ix-u'ixiy^ 

6^      c  2<''i    -'^'^t^'^  8a;i. 


g-(x-wia;i)'-i/2(r22Uf-'2 


2^227ro-iCr2 

Hence  the  total  probability  for  an  error  lying  between  x  and  {x-\-Bx) 
is  obtained  by  integrating  this  result,  that  is,  summing  all  possible 
probabilities,  between  a;i=— 00  and  x^=^-]-co.     This  gives 

_^____/         e        ^''^'^l     2(r-'2W-i2'       2<rV<^vi2  ^     '^"■^^"'^f^a; 
lyg  .  273-Cri(T2y-oo 

^^  .+00    .a:% ^^^ +J!M_:ci-_^!_ 

_  "-^  /  g        ^2(r2ia22w22^2a22t(;'^  ^     ^"^'^^^"^dx 

where  a^=w\cr2j^-(-2i;22or'i2 
2ttg^2^-'=° 


W9  .  27ro-i 


W2  -  2770-1(72 


g    2o-22iyV^ 


J-co  J  <7 
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Where  t = —pi — — x 

V2\o'iO"2^2      0-2^20' 


8^        ,-'  •  ..^V^/-/2.a,a,t.^ 


It^g  •  27r(TiO'2 


8X  -^/2<r2 

-e 


V27r.cr 
which  proves  that  the  error  x  obeys  the  normal  law  with 

S.D.=V(wVi+wV.)  ....     (5) 

The  above  principle  is  readily  extended,  for  if 

X=w;iXi+i/;2X2+  .  .  .  +?^^„X„, 

Xi,  X,,  .  .  .  X„  being  independent  variables  obeying  the  normal 
law,  then  X  also  obeys  the  normal  law  and  its 

S.D.=V(wVi+wV.+ •  • . +wV\)    .         .     (6) 

In  discussing  the  results  of  random  sampling  we  worked  upon 
the  principle  that,  given  a  number  of  sample  observations  of  any 
statistical  constant,  a  mean  or  a  percentage  or  a  coefficient  of 
regression  or  anything  else,  an  error  or  deviation  as  large  as  cr, 
the  standard  deviation,  from  the  true  value  for  the  whole  population 
might  quite  likely  occur,  but  that  an  error  exceeding  3cr  would  be 
unlikely,  and  we  explained  that,  as  a  result  of  convention,  the 
probable  error,  equal  to  fcr  roughly,  was  largely  used  in  place  of  a 
by  many  writers.  We  have  now  to  examine  the  basis  of  this 
principle,  and  the  first  point  to  notice  is  that  it  only  strictly  applies 
to  a  normal  distribution. 

To  fiTid  the  probability  of  an  error  lying  between  —a  and  -\-a  in  a 
normal  distribution. 

The  required  probability  =-y= —  I     e'^^'^'^'^dx 

1     r+i 
=-^=       e-^i^d^  (where  x  =(7^ ) 

V2W-1 

^-^(e-P'^d^ 
V27tJo  ^ 

=0-6827,  by  means  of  the  tables. 

This  then  is  the  probability  that  the  error  in  a  given  sample  shall 
not  exceed  the  S.D.,  cr.     The  probability  that  the  error  shall  exceed 


244 


STATISTICS 


cr  is  accordingly  (1— 0-68)=0-32,'  It  therefore  appears  that  the 
odds  against  an  error  exceeding  this  amount  are  68  to  32,  or  about 
2  to  I. 

The  probability  of  an  error  between  —2g  and  +2(7 

1        /•+2 

V27tJ-2 

=0-9545, 

and  the  probabiHty  of  an  error  outside  these  limits  =0-0455. 

Hence  the  odds  against  an  error  exceeding  2cr  are  about  21  to  1. 

The  probability  of  an  error  between  —  3cr  and  +3(7 

1        /•  +  3 

=-=       e-^'^Hi 

=0-9973. 
Hence  the  odds  against  an  error  exceeding  3cr  are  about  370  to  1 . 


_ 

B 
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Fig.  (44). 

That  these  results  are  reasonable  can  be  seen  by  an  examination 
of  the  curve  of  error 

N 


,-a;2/2a2 


the  graph  of  which  is  drawn,  fig.  (44),  in  the  particular  case  when 
(7=5,  N=100.  The  maximum  ordinate  is  thus=20/V27r=7'98, 
and  the  curve  becomes 

2/=7-98e-*'/5o. 
When  x^  a^  5,  2/=(7-98)(0-606)=4-84,  P^Ni  in  the  figure. 
„       a:=2(7=I0,  2/=(7-98)(0'135)=l-08,  PgN^      „ 
„      a-=3(7=15,  y=(7-98)(0-011)=0-09,  P3N3      „ 
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There  is  a  point  of  inflexion  where  the  curve  changes  its 
direction  at  P^,  also  at  the  companion  point  P'^  on  the  other  side 
of  OB. 

The  areas  ONiPjB,  ONgPaB,  ON3P3B,  P3N3X  represent  respec- 
tively the  frequencies  of  errors  0  to  cr,  0  to  2ct,  0  to  Sa,  Sa  and  over 
(considering  only  errors  on  the  positive  side,  that  is,  deviations 
above  the  mean),  and  the  figure  shows  how  very  improbable  is  a 
deviation  from  the  mean  exceeding  Scr,  for  the  area  between  the 
curve  and  axis  beyond  this  limit  is  negligible.  Put  in  another 
way,  a  range  of  6cr  should  include  practically  all  the  observations 
in  the  sample. 

The  probable  error  has  in  the  past  received  various  names,  such 
as  mean  error,  median  error,  quartile  deviation,  and  although  some 
of  these  may  seem  more  applicable  and  less  confusing  than  the 
name  to  which  it  has  settled  down,  there  is  perhaps  not  sufficient 
excuse -for  unsettling  it  again,  even  had  we  the  power  to  do  so, 
by  attempting  a  return  to  one  of  these  old  names. 

If  its  magnitude  be  r  it  is  defined  to  be  such  that  the  chance 
of  an  error  falling  within  the  limits  —r  and  +r  is  exactly  equal  to 
the  chance  of  an  error  falling  outside  these  limits,  in  fact  it  is  an 
even  chance  whether  a  particular  error  falls  within  these  limits 
or  not. 

Since  area  measures  frequency  it  follows  that  the  ordinates 
drawn  through  the  probable  errors  divide  both  halves  of  the  normal 
curve  (above  and  below  the  mean)  into  two  equal  parts  ;  the  one 
above  the  mean,  QR,  is  shown  in  fig.  (44),  and  consequently  the 
area  OBQR=the  area  QRX,  in  that  figure.  These  ordinates  there- 
fore coincide  with  the  quartiles,  and  the  probable  error  is  precisely 
the  same  measure  as  the  quartile  deviation. 

The  magnitude  of  the  error  is  readily  calculated  from  the  proba- 
bility integral  table,  for,  by  definition,  we  have 

1        r+^ 
i=— =—       e-^'/'^'^'dx 

V27T.aJ-r 
1       f+rl<r 

=-7=         e~^''H^  (where  x=a^). 

and  the  probability  integral  table  at  once  gives 
r=0-6745o-=approximately  |a, 
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Thus  we  have  the  frequently  quoted  rule  that  the 

quartile  deviation =?(standaxd  deviation),         .  •      (7) 

or  probable  error =0*6745  (S.D.) 

The  probability  of  an  error  lying  between  —  3r  and  +3r 
1        r+^r 


2        /-SCO -6745) 

=  —=1  e-^^'H^  (where  x=a^  as  before) 

=0-9570. 

Thus  the  odds  against  a  deviation  exceeding  three  times  the  probable 
error  occurring  in  a  single  trial  are  about  22  to  1,  or  much  the  same 
as  the  odds  against  a  deviation  exceeding  twice  the  S.D. 

There  remains  one  other  standard  of  measurement  in  connection 
with  errors  which  is  at  least  deserving  of  mention,  namely,  what 
we  have  previously  called  the  mean  deviation,  which  may  be  denoted 
by  t;.  It  is  simply  the  mean  of  all  errors  without  regard  to  sign  ; 
thus,  since  yhx  measures  the  frequency  of  an  error  lying  between 
X  and  {x-\-hx) 

rj=2  j    xydx  2 1    ydx 
=  l^xe-^l^-'dx/  Te-^'l-^'^'dx 
=ar$e-^'lMdfe-^''lH^  (where  x=g^) 

rco  I    rco 

=^/2aj    te-'^dt/j    e-^^dt  (where  ^'^=2t*) 

=v2c7[-cn^ 


2  Jo''    2 


=aV2/7r 
=0-7979c7, 
hence  the  rough  rule  that  the 

mean  deviation =g(standard  deviation)     .         .         .     (8) 

It  must  be  borne  in  mind  that  all  the  above  rules  relating  to 
errors — using  the  term  as  synonymous  with  the  deviations  of  single 
or  sample  observations  from  the  mean  of  a  considerable  number  of 
the  same  character — strictly  apply,  as  we  said  before,  to  the  normal 
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curve  of  error  and  are  only  approximately  true  for  other  distribu- 
tions, the  approximation  being  the  closer  the  nearer  they  approach 
to  the  normal  form  and  the  larger  the  number  of  observations 
involved.  They  have  been  tested  in  some  cases  in  earUer  chapters 
(see,  for  example,  Chapter  VII.),  and  the  results  obtained,  even 
with  very  skew  distributions  of  comparatively  small  numbers  of 
observations,  are  at  all  events  close  enough  to  suggest  the  utility 
of  the  rules  in  more  favourable  cases. 

The  effect  of  variaf)ility  on  errors.     The  probability  of  an  error 
lying  between  0  and  t 


1        /•« 


V2, 


TT  .  CJ.'O 


Put  x—x'jm,  and  this  becomes 


1        p 


,-X''H2<T^)'l'^^ 


m 


a/27t  .  (ma) 


1  /-("lO 

.  inujyo 


Fio.  (45). 


Thus,  if  the  variability  be  increased  m-fold  the  range  of  error  (of 
equal  probability)  is  increased  m-fold,  so  that  if  we  have  two  sets 
of  N  observations,  with  the  variability  of  one  set  double  that  of 
the  other,  the  range  of  error  also  in  the  one  set  is  double  that  which 
is  equally  likely  to  occur  in  the  other.  This  is  brought  out  fairly 
clearly  in  fig.  (45),  which  is  the  result  of  plotting  the  curve 


N 
2/=— =-c 
V27r<T 


-arhl<r- 
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in  the  two  cases.  The  variability  a  of  curve  (1)  is  double  that 
of  curve  (2)  ;  if  then  we  measure  along  OX  in  the  figure 

ONi=:20N2=2^, 

the  area  B^ONiPi  will  be  equal  to  the  area  B2ON2P2,  showing  that 
the  probability  for  an  error  between  0  and  2^  in  the  one  case  is  equal 
to  the  probability  for  an  error  between  0  and  t  in  the  other  case. 

[James  Bernoulli  (1654-1705),  the  eldest  of  three  remarkable  brothers, 
showed  how  the  binomial  theorem  could  be  used  to  estimate  the  probability 
that  the  ratio  of  the  number  of  successes  to  the  number  of  failures  under 
defined  conditions  should  lie  between  set  limits,  where  success  means  that  a 
certain  event  happens  and  failure  means  that  it  f aUs  to  happen. 

It  was  Gauss  who  first  actually  published  a  proof  (1809)  of  the  equation  of  the 
normal  curve,  although  Laplace  had  suggested  as  early  as  1783  the  utility 
of  a  probability  integral  table,  ^e-^Ht.  Gauss's  proof  depended  upon  certain 
axioms  which  cannot  be  established  and  are  not  necessarily  true,  one  of  which 
was  that  '  errors  above  and  below  the  mean  are  equally  probable.'  Laplace 
and  Poisson  improved  upon  Gauss  and  succeeded  without  assuming  this 
axiom,  but  with  the  aid  of  theorems  due  to  Euler  and  Stirling,  in  developing 
the  continuous  probability  integral  from  the  discontinuous  binomial  series. 

Further  extensions  of  the  normal  curve  applicable  to  skew  distributions 
have  been  worked  out  by  other  writers,  such  as  Galton  and  Mc  Alister,  Fechner, 
Lipps,  Werner,  Charlier,  Kapteyn,  and  finally  by  Edgeworth,  who  has  contri- 
buted materially  to  the  development  of  the  idea  of  '  the  Law  of  Great 
Numbers.'  Karl  Pearson  approaching  the  subject  of  skew  variation  from 
the  same  point  but  by  an  original  route,  has  discovered  a  complete  system  of 
curves  suitable  for  fitting  almost  all  kinds  of  distributions  in  homogeneous 
material,  especially  such  as  are  met  with  in  the  biological  world. 

(See  Todhunter,  History  of  Probability. 

Edgeworth,  Law  of  Error  in  the  Encyclopaedia  Britannica  (10th  edition). 
Pearson,  Das  Fehlergesetz  und  seine  Verallgemeinerungen  durch  Fechner 
und  Pearson  :   A  Rejoinder  ;  Biometrika,  vol.  iv.,  pp.  169-212).] 


CHAPTER    XIX 

FREQUENCY    SURFACE    FOR    TWO    CORRELATED    VARIABLES 

It  may  serve  at  this  stage  to  widen  the  outlook  upon  the  subject 
of  correlation  for  those  who  are  able  to  follow  it  up  on  mathe- 
matical lines  if  we  briefly  consider  the  algebraical  expression  for 
the  combined  distribution  of  two  variables. 

Let  the  variables  be  X^,  Xg.  They  may  be  absolutely  independent 
or  they  may  be  related  in  some  way,  but  in  either  case  we  shall 
assume  it  possible  to  set  up  a  one-to-one  correspondence  between 
them  :  thus,  X^  might  represent  the  marriage  rate  and  Xg  the 
index  number  for  wholesale  prices,  and  we  might  always  pair 
together  the  X^  and  the  X2  which  refer  to  the  same  year,  as  in  the 
correlation  example  in  a  previous  chapter  ;  moreover  this  pairing 
might  still  be  effected  even  if  there  were  really  no  other  connection 
at  all  between  X^  and  Xg. 

If  then  a^i,  x^  typify  the  deviations  of  X^,  Xg  from  their  respective 
means  (the  means  in  the  above  case  being  derived  by  averaging 
the  figures  for  a  number  of  years),  it  is  possible  to  write  down  an 
expression  of  the  form 

for  determining  the  probability  of  deviations  between  x^  and 
(o^i-f  Sajj),  X2  and  (^Cg+S^Tg),  occurring  simultaneously  (in  the  same 
year,  in  the  above  case) ;  or,  to  put  the  same  thing  in  another  way, 
ySx^Sx^  would  represent  the  proportional  frequency  with  which 
such  deviations  might  be  expected  to  occur  together  in  a  large 
number  of  observations. 

The  frequency  curve  y=f{x),  where  ySx  denotes  the  frequency 
with  which  a  variable  with  deviation  lying  between  x  and  {x-\-8x) 
from  its  mean  value  is  observed  in  a  given  distribution,  was  repre- 
sented by  plotting  corresponding  pairs  of  values  of  x  and  y  as 
points  in  a  plane.  In  the  expression  y=Y{Xi,  Xg),  however,  we  have 
three  variables  to  consider,  x^  and  x^j  and  y  which  measures  the 
frequency  of  the  simultaneous  appearance  of  x^  and  x^.  Such  a 
trio  may  geometrically  be  represented  by  a  point  P  {x^,  x^,  y)  in 
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Fig.  (46). 


space  of  three  dimensions,  for  (xj,  a^g)  can  first  be  located  as  a  point 
in  a  fixed  plane  and  a  height  y  may  then  be  measured  above  this 
plane  as  in  fig.  (46).  Clearly  as  x^  and  a^g  vary,  y  also  varies,  and 
consequently  the  point  P  moves  about  in  space,  but  it  moves  always 
in  obedience  to  the  relation 

y=¥{x^,  x^). 

This  relation  is  called  the  equation  of  the  surface  along  which 
P  travels,  showing  that  it  holds  good  for 
the  co-ordinates  (x-^,  x^,  y)  of  any  position 
wl\ich  the  point  can  take  up  on  that  surface. 
It  is  convenient,  however,  to  use  the  notation 
z=F(x,  y) 

in  preference  to  y=F{x-i^,  x^  for  the  'fre- 
quency surface,'  because  OX,  OY  are  nearly 
always  taken  to  represent  the  axes  of  refer- 
ence in  space  of  two  dimensions  (i.e.  in  a  plane),  and  by  a  natural 
extension  OX,  OY,  OZ  are  taken  to  represent  the  axes  of  reference 
in  space  of  three  dimensions,  fig.  (47). 

We  proceed  to  discuss  the  frequency  surface  for  two  variables, 
and  we  shall  start  with  the  comparatively  simple  case  when  the 
variables  are  completely  independent. 

Frequency  surface  showing  distribution  of 
two  completely  independent  variables  each 
subject  to  the  normal  law. 

Let  X,  Y  be  the  variables,  and  let  x,  y  de- 
note deviations  from  their  means  X,  Y,  the 
point  (X,  Y)  being  taken  as  origin  of  co-ordi- 
nates and  the  usual  notation  being  adopted. 

Thus  the  probability  of  a  deviation  between^  and  (a:+Sa:)  occurring 

— g        0^*" 

V277  .  G^ 

and  the  probability  of  a  deviation  between  y  and  (y-\-^y)  occurring 


Fig.  (47). 


hy 


i^o 


V27r  .  Gy 
Therefore  the  probability  of  such  deviations  occurring   together 
since  the  variables  are  supposed  completely  independent 


hx 


g-«2/2<r.^ 


hy 


\V27T.G^  JW27T.G^ 

27rO'j.CTy 


_g-2/2/2<r,5 
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Hence   the   frequency  with  which   such  pairs    of   deviations  are 
observed  together  if  n  be  the  total  number  of  observations 

Denoting  this  by  zSxSy,  we  get  for  the  required  frequency  surface. 

z=n/27TC7^y .  e    ^'''  ''"'^     .         .         .     (1) 

If  we  give  y  some  particular  value,  2/1,  we  find  from  the  above 
equation  that  the  law  of  frequency  for  the  corresponding  x  is 


2i7TayXjy 


\_27ra^y  J 


g-xa/2crx2 


where  n^  has  been  written  in  place  of 

\V27r.(Ty 

But  this  is  evidently  a  normal  curve  in  the  plane  XjOZj,  having 
the  same  mean,  X,  and  the  same  S.D.,  erg.,  whatever  be  the  value 
of  y^. 

Hence  all  arrays  of  X  are  similar,  having  the  same  mean  and  the 
same  standard  deviation,  and  this,  by  symmetry,  also  applies  to 
all  arrays  of  y. 
.     Now  put  z  equal  to  some  constant,  k,  in  equation  (1),  so  that 

k—— —6      ''    '^'^ 

n 

Since  the  left-hand  side  of  this  equation  is  constant  for  different 
values  of  (x,  y),  it  follows  that  the  right-hand  side  is  also  constant, 
and  hence 

^+i^,=c,     ...     (2) 

where  c  is  a  constant. 

We  conclude  that  the  values  of  x  and  y  which  can  occur  together 
with  a  given  frequency,  k,  are  such  that  the  point  {x,  y)  always  lies 
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somewhere  on  the  ellipse  (2)  in  the  plane  z^k,  fig.  (48)  ;  e.g.  values 
in  the  neighbourhood  of  x^^  and  y^  occur  with  the  same  frequency  as 
values  in  the  neighbourhood  of  x^  and  0,  because  in  the  figure  the 
points  (x^,  2/i5  ^)  ^-nd  (x^,  0,  h)  both  lie  on  the  ellipse  defined  by 


z=k, 


,+- 


The  different  ellipses  which  can  be  obtained  by  varying  the 
frequency,  and  consequently  varying  c,  are  clearly  concentric, 
similar,  and  similarly  situated  if  they  are  orthogonally  projected 
on  to  the  plane  z=0,  for  the  effect  of  such  projection  is  that  any 


Fig.  (48). 

point  (x,  y,  z)  drops  down  on  to  the  point  (x,  y,  0)  which  stands 
immediately  below  it  in  the  plane  XOY. 

The  general  shape  of  the  surface  can  be  gathered  from  fig.  (48) 
where  the  ellipse  z^=k,  and  the  normal  curves  a;=0, 2/=0,  and  2/=2/i 
have  been  drawn. 

It  will  also  be  noted  that  if  the  scales  of  x  and  y  are  altered  by 

X  u 

writing  — —x'  and  —=y',  so  that  unit  change  in  each  may  be  the 

same,  the  ellipse  (2)  becomes  a  circle 
x'^-\-y'^=c. 

This  change  of  scales  is  equivalent  geometrically  to  projecting 
orthogonally  the  ellipse  into  a  circle  ;  of  course  the  planes  of  pro- 
jection are  not  the  same  as  in  the  previous  orthogonal  projection 
mentioned, 
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Frequency  surface  for  two  correlated  variables.  Let  the  variables 
be  X  and  Y,  and  let  us  work  as  before  with  their  deviations  x  and  y, 
whichis  equivalent  to  taking  the  mean  point  (X,  Y)of  all  the  observa- 
tions as  origin. 

Now  the  line  of  regression  giving  the  best  y,  or  the  y  of  greatest 
frequency,  corresponding  to  any  x  is 


y=r- 


with   the    usual   notation,   r  being   the    coefficient   of  correlation 
between  X  and  Y. 

Hence  the  error  made  in  estimating  any  y  from  this  equation 
instead  of  taking  the  y  given  by  observation  is 

7]=y  (observed)  —y  (estimated) 


=y-rJLx.     [See  fig.  (49).] 


Thus,  corresponding  to  every  pair  of  observations  (ic,  y)  there  is 
an  77,  and  the  same  77  will  be  repeated 
just  as  often  as  the  same  pair  of 
observations  (a;,  y)  is  repeated. 

Therefore  the  frequency  distribu- 
tion of  (a;,  7;)  must  exactly  correspond 
to  that  of  {x,  y). 

Further,    the   correlation   of    the 
variables  x  and  rj  is  zero,  for  posi- 
tive and  negative  errors  77  are  equally  likely  to  occur  for  different 
values  of  x;  in  fact,  this  coefficient  of  correlation  is  E{xr^)ln<jy.(T^,  and 


Y 

V 

4-' 

0 

X 

Fio.  (49). 


i:(xr^)=E\x[y 


r-^x 


■■E{xy)- 


S(x') 


P 


=np- 

^^np—rvp 
=0. 


na. 


Assuming  then  that  the  variables  x  and  7;  are  quite  independent, 
the  probability  of  them  occurring  together  is  readily  \^Titten  do^^^l, 
for  it  is  simply  the  product  of  their  separate  probabilities. 
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But  the  probability  of  a  deviation  between  x  and  {x-\-%x)  occur- 
ring,  if  we  consider  this  variable  alone,  is 


1^     .,-2S, 


V27r(T, 


and  the  probability  of  a  deviation  between  7;  and  (tz+S?;)  occurring.    \ 
if  we  consider  this  variable  alone,  is  \ 


s^^-2:> 


a/27 
Hence  the  probability  of  a  combined  occurrence  of  such  deviations 


a;2 


\V27r(7^  /  \V27rc7^ 


277(7^(7, 
_   83:87; 


27ror^CT„ 


U,2+        0-2        j 


27ro-a^, 
But  mj^^=E(y-r'^x 


2 


:2;(2/2)-2r  .  ^^  .  2:{xy)+r-^2(x^) 


jy 


2 


Similarly,  no^^na^iX—^^ 

where  f  is  the  error  made  in  estimating  x  from  x=r—y 

...  %=<=(l-r^). 

Thus  ^^  =-L . °:v__L  .^=Jl, 
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/  1    ."  r^Gj\       1  /,  .    .    aJ\ 


and  ^+!:^    =jL(i+,2.^\ 


1 


Hence  the  probability  of  the  combined  occurrence  of  deviations 
X  to  (x-\-hx),  7}  to  (77+ St;) 

= '- «        ^o-„2        »        o-tcTr,^         <rt2)    • 


2TTG^.ayVl-f^ 

thus,  if  we  denote  by  zhxSy  the  frequency  of  the  combined  occur- 
rence of  deviations  x  to  (x-{-Sx),  y  to  {y-^hy),  when  ri  is  the  total 
number  of  observations,  we  have  * 


z=- 


27r\/l— r^  .  CjcO-y 


When  the  variables  X  and  Y  are  completely  independent,  so  that 
r  is  zero,  this  reduces,  as  it  should,  to  our  previous  result 

27r(JxCry 
In  the  surface  z—fju.e    ^<^^'^  '^f'^     <r««ry/i-r3    .  ,  .     (3) 

where  /x  = = ,  if  we  give  y  some  particular  value  y^, 

27r\/l-r2  .  <7^c7y 

we  find  that  the  law  of  frequency  for  the  corresponding  x  is 


z=fx  ,  e  2(1 


1 (yh^x'^  -gr'^J  ) 


:/Lt.e    2(l-r2)|<r/  '^Va,      a,/ 


=/x  .  e  *-"''  e 


2<ry2       2(1 -r2)V<rr      cry/ 


(4) 


[*"  For  an  outline  of  Karl  Pearson's  method  of  reaching  the  Law  of  Frequency 
for  two  correlated  variables,  and  certain  deductions  from  it,  see  Appendix, 
Note  11.] 
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But  just  as 


y- 


-i 


(x-a)2 


V27rc7^ 

represents  exactly  the  same  normal  curve  as 

1 


y 


-A 


Fig.  (50). 


A/27rC7a,  I 

shifted  through  a  distance  a  along 
the  axis  of  x,  fig.  (50),  so  we  con- 
clude that  the  curve  (4)  in  x  and 
z,  in  the  plane  y=yi,  is  exactly  the 
same  as  the  normal  curve 

a:2 


yhlW- 


g-ior;.2(l_,.2) 


shifted  through  a  distance  ry^—  along  an  axis  parallel  to  OX.    In  fact , 

CTy 

(4)  represents  a  normal  distribution  for  x,  the  mean,  corresponding 
to  greatest  frequency  when  z=-~-—-^,  being  determined  by  the] 

intersection  with  the  surface  (3)  of  the  planes 

X        y 
y=yv  --=r-,  ] 


and  the  standard  deviation  being  a^Vl—r^,  which  we  note  is^ 
independent  of  y^,  fig.  (51).  To  put  the  same  thing  in  another; 
way,  the  array  of  x's  corresponding  to  a  particular  value  2/1  of  y\ 

have  a  mean  deviating  from  X  by  r—  .  y^,  and  a  standard  deviation? 


In  particular,  when  y=0,  z=fjbe  ''<^'^^-^')^  a  normal  distribution^ 
for  X,  the  mean,  corresponding  to  greatest  frequency  with  z=fjby\ 
being  determined  by  the  intersection  with  the  surface  (3)  of  the; 


y 


planes  2/=0,  — =r— ,  and  the  standard  deviation  being  Ur^^/\—r^\ 


as  before. 

Similarly,  when  x=Xi,  we  get  as  in  (4)  a  normal  distribution  for  y. 


-fxe 


the  mean,  corresponding  to  greatest  frequency  when  z- 


determined  by  the  intersection  with  the  surface  (3)  of  the  planes 


being] 


X  — X-iy 


y 
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and  the  standard  deviation  being  or^Vl— r^,  which  is  independent 
of  x^.     In  other  words,  the  array  of  y'a  corresponding  to  a  particular 

value  Xi  of  x  have  a  mean  deviating  from  Y  by  r— Xj,  and  a  standard 


deviation  GyVl—r^. 


In  particular,  when  x=0,  z=fjLe  '^y'^^-^^)^  a  normal  distribution 
for  y,  the  mean,  corresponding  to  greatest  frequency  with  2;=yLt, 
being  determined  by  the  intersection  with  the  surface  (3)  of  the 


planes  x=0,  —=r—,  and  the  standard  deviation  being  CyVl—r^. 
By  putting  2=some  constant,  k,  and  arguing  just  as  we  did  in  the 

2 


Fig.  (51). 

case  of  two  independent  variables,  we  find  that  all  values  of  x  and  y 
which  occur  together  with  the  same  frequency  define  points  {x,  y) 
which  lie  on  the  ellipse 

The  different  ellipses  which  can  be  obtained  by  varying  the  fre- 
quency, and  consequently  varying  c,  are  concentric,  similar,  and 
similarly  situated,  if  they  are  orthogonally  projected  on  to  the 
plane  z=0.  The  planes  giving  the  means  of  the  x's,  or  the  most 
frequent  x's,  corresponding  to  particular  values  of  y,  and  the  means 
of  the  2/'s,  or  the  most  frequent  2/'s,  corresponding  to  particular 
values  of  ic,  meet  2=0  in  the  Unes  of  regression 


X       y    y 

7         , 

CflJ  <Ty       (Jy 


x 
-r — 
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If  we  alter  the  scales  of  x  and  y  by  writing  — ^=x'  and  —  =?/'> 

so  that  unit  change  in  each  shall  be  of  the  same  magnitude,  the 
frequency  surface  takes  the  form 


z=^e  2(1-'^)' 


(x"^+y".i  -  2r3fy') 


When  y'=0,  z=fie  ^(i-^"^)    ^  a  normal  distribution,  the  mean  being 


on  the  plane  x'=ry\  and  the  standard  deviation  being  Vl—r^. 

Similarly  for  x'=0.  When  y'=y\,  2=jLte"*''%'2(i-r2)(^-''^'iV'^^  ^ 
normal  distribution,  the  mean  being  on  the  plane  x'=ry',  and 
the   standard   deviation   being   Vl—r^   as    before.     Similarly  for 

Again  the  ellipse  which  is  the  locus  of  the  points  {x'y')  obtained 
by  putting  2;=constant,  k,  corresponding  to  variables  which  occur 
with  the  same  frequency,  is  (in  the  plane  z=k)  now 

x'^+y"^-2rx'y'=c, 
and,  projecting  on  to  the  plane  z=0,  the  lines  of  regression  are 

x'=ry',  y'=rx'. 

These  lines  are  the  intersections  with  2=0  of  the  planes  containing 
the  means  of  the  a;"s,  or  the  most  frequent  x"s,  corresponding  to 
particular  y"s,  and  vice  versa. 

X  11 

Since,  geometrically,  the  transformation  —=x\  —=y',  is  equiva- 

CTa,  (Jy 

lent  to  an  orthogonal  projection,  we  may  learn  something  about 
the  more  general  ellipse  by  considering  properties  of  the  simpler 
projected  curve  which  are  not  changed  by  projection. 

Let  us  first,  however,  find  the  magnitude  and  direction  of  the 
axes  of 

x'^-\-y"^—2rx'y'=c. 

By  turning  the  axes  through  some  angle  6  this  equation  is 
reducible  to  the  form 

which  is  the  ordinary  form  for  an  ellipse  when  its  axes  lie  along 
the  axes  of  co-ordinates.  But  the  equation  in  x\  y'  is  clearly 
symmetrical  about  the  lines  y'  ^=x'  and  y'  ^—x\  because  y'  and  x' 
or  y'  and  —x'  can  be  interchanged  without  the  equation  being 
affected.  Hence  these  lines  must  give  the  directions  of  the  major 
and  minor  axes. 
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To  turn  the  axes  of  co-ordinates  through  an  angle  of  45°,  fig. 
(52),  we  must  write 

x'  =x"  cos  45°-/  sin  45°=^^~j^'' 

V2 


/'  I  «  .// 


y'=x"  sin  45°+^/"  cos  45^ 
Y' 


x"+y 
'   V2 


Fig.  (52). 

The  equation  of  the  ellipse  thus  becomes 

(x"-y"f  ,  {x"+y"f     ^S^"-y"){x''+y'') 


2r^ 


I.e. 
i.e. 

i.e. 


2        ■         2  V2V2 

a;"2(l-r)+2/"2(l+r)=c, 


-c. 


x"^       y^ 

c         c 

l—r    1+r 


=1. 


Hence  the  semi-major  axis  is  a=    / ,  and  the  semi-minor  axis 

SJ  l—r 

is  6=  ^  / We  note  that  as  r  increases  from  0  to  1,  a  increases 

V  1+r 

from  Vc  to  00,  while  h  decreases  from  Veto  .  /    •    Also,  as  r  decreases 

from  0  to  —1,  a  decreases  from  ^/c  to     /  — ,  while  6  increases  from 
Vc  to  (X). 
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The  ellipses,  x"^-\'y"^—^rx'y'=c,  corresponding  to  different  values 
of  r  all  pass  through  the  points  of  intersection  of  | 

x"^^y'"—c  and  x'y'=0.  i 

But  x'^-\-y'^=c  is  what  the  equation  of  the  ellipse  becomes  when  r,  ■ 
the  coefficient  of  correlation,  vanishes.  The  connection  between  ; 
these  curves  is  shown  in  fig.  (53),  which  represents  their  projection  | 
on  to  the  plane  z— 0.  A  positive  correlation  between  x  and  y  i 
might  be  expected  to  increase  the  y  corresponding  to  a  particular  i 
positive  X,  if  the  frequency  be  fixed  beforehand,  and  that  is  the  | 
effect  which  the  figure  also  would  suggest.  j 


Fig.  (53). 


Now,  in  x'^+y'^-2rxy=c, 

the  lines  of  regression  are 


y  =rx  ,  y  =-x  , 
r 


and  the  axes  of  the  elHpse  are 


y'=x',  y'  =—x' 


Hence  the  lines  of  regression  are  equally  inclined  to  the  axes  of  the 
ellipse  as  well  as  to  the  axes  of  co-ordinates,  fig.  (54). 
Further,  the  pair  of  lines 


y'=x\  y'=-x' 


form  a  harmonic  pencil  with  the  pair 
x'=0,  y'=0, 
and  also  with  the  pair 

1 


y'=rx\  y'=-x 
r 


This  is  obvious  from  fig.  (54). 
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Now  project  back  to  the  ellipse 


^+— -2r-^=constaQt. 
The  algebraical  transformation  for  this  is  merely 


Fig.  (54). 

Since  the  harmonic  property  is  unaltered  by  projection  we  then 
have  the  pair  of  lines 

y  _x     y  _     X 

Gy        Gx      Oy  CTg. 

harmonic  with  the  pair 

x=0,  y=0, 
and  also  with  the  pair 

y  _  X     y  _1     X 

Gy  Gg,       Gy  T  G  y. 

Hence   the   two  lines   of  regression   corresponding  to  maximum 
correlation  (r=+l  and  r=— 1)  are  harmonic  with 

(1)  the  axes  of  co-ordinates  ; 

(2)  the  lines  of  regression  for  any  r. 

Again  it  may  be  easily  seen  that  the  lines 
y'=rx'  and  a;'=0 
are  conjugate  diameters  of  the  ellipse 

x'^+y'^-2rx'y'=c,      .         .         .     (6) 
for  they  may  be  written  as  one  equation  thus  : 

rx'^-x'y'=0, 
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and  this  represents  a  pair  of  lines  harmonic  with  the  (imaginary) 
asymptotes  of  (5),  namely,  with 

x'^-\-y'^—2rx'y'=0. 
[The  criterion  for  ax^-\-21ixy-\-hy'^=0 

to  be  harmonic  with        a'  x'^ + 2h'  xy-{-b'y^=0 
is  ab' -i-ba' =2hh' .] 

But  it  is  a  well-known  property  of  conies  that  any  pair  of  lines 
harmonic  with  the  asymptotes  are  conjugate  dianieters  of  the 
conic. 

Similarly  it  may  be  shown  that  the  lines 

y'  =-x'  and  y'  =0 
r 

are  conjugate  diameters  of  the  ellipse  (5). 

But,  on  projection,  the  conjugate  property  also  is  unaltered. 

1/  X 

Hence  the  lines  —  =r — ,  x=0, 

II       \    oc 

and  the  lines  — = ,  y=0 

^y     'f  ^x 
are  conjugate  pairs  of  diameters  of  the  ellipse 

But  for  conjugate  diameters  the  midpoints  of  all  chords  parallel 
to  either  lie  on  the  other. 

Thus  we  come  back  again  by  another  route  to  the  familiar  line  of 
regression  theorems  that,  for  a  given  r,  all  arrays  parallel  to  a;=0 

have  their  means  on— =r— ,  and  all  arrays  parallel  to  y=0  have 

X  1J 

their  means  on  _=r— • 


APPENDIX 

1.  Compound  Interest  Law.  If  the  capital  increases  continuously, 
instead  of  going  up  by  jumps  at  the  end  of  stated  periods,  the  con- 
nection between  the  original  principal  S^,  the  rate  per  cent,  per 
annum  r,  and  the  amount  S^  at  the  end  of  t  years  is  given  by 

for  the  rate  of  increase  is  measured  by 

dB_  rS 

which  leads  at  once  to  the  above  equation  on  integrating. 
Other  instances  of  the  same  law  are  : — 

(1)  ^  particle  moving  against  a  resistance  proportional  to  its 
velocity,  v^=VQe~'^\ 

where  v^  is  the  velocity  at  time  t,  v^  is  the  original  velocity,  and  c  is 
some  constant. 

(2)  The  .variation  of  the  pressure  of  the  atmosphere  with  height, 

where  pj^  is  the  pressure  at  height  h  above  a  surface  level,  p^  is  the 
pressure  at  the  surface,  and  c  is  some  constant. 

{^)  The  rate  of  cooling,  d^z=:0^e~'^\ 

where  Of  is  the  excess  of  temperature  at  time  t  of  the  hot  body 
over  that  of  surrounding  bodies,  6^  is  the  excess  when  the  measure- 
ment begins,  and  c  is  some  constant. 

2.  Weighted  Mean.  Let  the  observations  be  represented  by  the 
different  values,  Xj^,  x^,  .  .  .  x^,  of  the  variable  concerned,  and  let 
the  respective  weights  attached  to  these  observations  be /i,/2,  -  -  •  fn^ 
so  that  the  average,  by  definition, 

_a;j/i4-a:2/2+   »  ♦  ♦  -\-Xnfn 

S68 
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Now,  suppose  a  different  set  of  weights  be  chosen,  namely, 
fv  /'2»  •  •  •  /'n»  giving  a  new  average 

/1+/2+  •  •  •  H"/w  "  ■ 

The  difference  between  these  two  expressions 

_^l/l  +  a;2/2+     •     •     •_a^l/'l  +  a^2f2+     •     •    • 
/1+/2+     .     .     .  f  l+/'2+     .     .    • 

(/1+/2+     .    •    •)(/'!+/ 2+     .    .    .) 

_i(/lf2K-^2)-/2fl('^l-^2)j  +  j/lf3(^l-^3)-/3f  1(^1-^3)!+     •    '    ♦ 
(/l+/2+     •    •    .)(f  I+/2+     .    .    .) 

flf2(^l-^2)(^-^J+/lf3(^l-^3)(^^-^J+     .     .     . 
^  (/1+/2+     •    •    •)(/'l+/'2+    .    .    .) 

Hence  this  difference  is  very  small  and  the  averages  are  very 
nearly  equal  if  the  weights  f-^,  /g,  fz  •  -  •  ^^^  replaced  by  others 
fi,  /'a,  fz  '  '  •  very  nearly  proportional  to  them,  so  that  /i//'i, 
/2//2>  /s/Z's  •  •  •  are  not  far  from  equality,  and  this  is  the  more 
pronounced  if  the  observations  x^,  x^,  iCg  .  .  .  themselves  are  all 
of  the  same  order  of  magnitude  and  the  sums  of  their  weights, 
27/ and  2*/',  are  large  so  that  the  expressions  of  ty^Q(x^—x^l(Ef){Sf') 
are  small. 

3.  Geometric  and  Harmonic  Means.     Given  n  numbers 
a,  6,  c  .  .  . 
their  geometric  mean,  g,  is  defined  by  the  formula 

g=^l/(ahc  .  .  .  ), 
and  their  harmonic  mean,  ^,  is  defined  by 

1=-+.-+'+  •  •  • 

h    a    b     c 

We  note  that  when        a=b—c=  .  .  .  =k,  say, 
then  g=l/(kkh  .  .  .)  =  l/{k'')=k 

and  _=_-}-_-|---[-  .  .  .  =_ 

ih       fC       i€       iC  K 

so  that  h=k. 
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It  is  worthy  of  remark  that  if  the  geometric  mean  be  adopted  as 
average  in  discussing  the  index  numbers  of  prices  it  possesses  an 
interestihg  property  which  does  not  hold  for  any  of  the  other  means 
in  common  use. 

Suppose  the  prices  of  n  standard  commodities  at  three  successive 
dates  be  represented  by  (a^,  6^,  c^  .  .  .  ),  (a^,  h^,  Cg  .  .  .)>  (<^3j  ^3.  ^3  .  .  .)• 
Then  the  index  numbers  of  the  separate  commodity  prices  at  the 
third  date,  taking  the  prices  at  the  first  date  as  standard,  are 

100-«,  100^,  100^  .  .  . 
a^         hi         Ci 

Hence  the  geometric  mean  of  these  n  index  numbers  together 

100??  X  100^  X 100??  X    .  .  . 
«!  bi  Ci 

where  g^,  g^  denote  the  geometric  means  of  the  n  prices  at  the  two 
dates. 

It  follows  that  the  ratio 

index  number  of  prices  at  3rd  date  with  prices  at  1st  date  as  standard 
index  number  of  prices  at  2nd  date  with  prices  at  1st  date  as  standard 

lOOgJgi 

=9J92' 
It  is  therefore  quite  ifidependent  of  the  particular  date  chosen  as 
standard. 

4.  The  Mean  of  Combined  Sets  of  Observations.    (1)  Suppose  one 
variable  x  is  expressed  as  the  sum  of  a  number  of  other  variables, 

thus  a;=a+6+c+  .  .  ., 

and  suppose  that  we  have  n  different  values  of  the  variables,  giving 
equations  of  the  type 


Xn=0'n+K-^Cn+ 
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Hence,  by  addition, 

so  that  nx=^nd-\-nB-{-nc-\-  ... 

x=d-\-h-\-c-Y  .  .  ., 

where  x,  a,  h  .  .  .  denote  the  means  of  the  n  values  of  the  respec- 
tive variables. 

Thus  the  mean  of  a  sum  equals  the  sum  of  the  means,  and,  if  some 
of  the  positive  signs  in  {a-\-b-\-c-\-  .  .  .)  are  made  negative,  there 
will  evidently  be  a  corresponding  change  of  sign  in  (a+6+  .  .  .). 

Example. — Suppose  100  family  budgets  are  collected  and  the 
items  in  each  are  separated  under  five  heads — rent,  food,  clothes, 
coals  and  light,  sundries.  The  expenditure,  x,  in  each  budget  would 
thus  be  expressed  as  the  sum  of  five  variables,  a,  b,  c,  d,  e,  and  the 
mean  of  the  100  different  re's  would  equal  the  sum  of  the  means  of 
the  a's,  the  6's,  the  c's,  the  d'a,  and  the  c's. 

(2)  Sets  of  observations  are  mxide  which  differ  in  locality  or  time  or 
some  other  respect.     To  find  the  resultant  mean. 

Let  I  observations  of  the  variable  x  refer,  say,  to  one  date, 
„    m  „  „  „  „      „      a  second  „ 

„     n  „  „  „  „      „      a  third      „ 

and  so  on,  and  let  the  means  of  these  successive  groups  of  observa- 
tions be  Xi,  x^,  :r^,  .  .  .  ,  so  that  we  may  write 

Xi=I!xi/l,  x^=.UxJm,  x^^ZxJn,  .  .  . 

If  then  X  be  the  resultant  mean,  we  have 

Zxi+2x^-\-  .  .  .  _lxi+mx^+  .  .  . 


Z+m-j-  .  .  .  Z+mH-  .  .  . 


Example. — If  the  school  children  in  the  different  schools  of  a 
county  are  weighed,  I  children  in  one  school,  m  in  another,  n  in 
another,  and  so  on,  giving  mean  weights  Xi,  x^,  x^  .  -  -  »  the 
resultant  mean  weight  for  the  children  in  all  the  schools  combined 
is  then  given  by  the  above  expression. 

5.  Mean  and  Standard  Deviation  of  a  Distribution  of  Variables. 

Let  Xi,  X2,  x^  .  .  .  Xn  denote  the  deviations  of  each  value,  or  group 
mid- value,  of  the  observed  organ  or  character  when  measured  from 
some  fixed  value,  and  let  f^,  /2,  fz  -  -  -  fn  denote  the  observed 
frequencies  of  these  respective  deviations. 
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The  arithmetic  mean  of  the  variables  is  thus  given  by 

^  =  (/l^l+/2^2+   .   .   .   H-/„^„)/(/i+/2+   .   .  .   +/„), 
referred  to  the  fixed  value  as  origin. 

We  may  conveniently  represent  the  deviations  x^,  x^,  x^  .  .  .  hj 
lengths  measured  from  an  arbitrary  origin  0  along  a  straight  Une, 
in  which  case  the  point  0  defines  the  position  of  the  fixed  value 
from  which  the  variables  are  measured. 

Let  P  mark  the  position  corresponding  to  a  typical  variable  and 

let   G  mark  the  position  corre-  _      ^ ^ ^ 

sponding  to  the  mean,  x.     Thus   g g ^ 

OV=x,  OG=:r,  and  if  we  denote  "^ '-^ ^ 

the  distance  of  P  from  G  by  f ,  we  have 

x=x-\-^. 
Hence 

^==(/l^l+/2^2+    .    .    .    +/a)/(/i+/2+    .    .    .    +/n) 
=[/l(^+^l)+/2(^+f2)+    •    .    •    +/n(^+f J]/(/l+/2+    .    .    .     +/n) 
=  mfl+f2+  .  •  •  +/J+(/lfl+/2^2+  .  •  .  +/nf«)]//l+/2+  .  .  .  +/n) 
=^+(/l^l+/2f2+    .    .    .    +/nfn)/(/l+/2+    •    •    •    +/„)• 

Therefore  {Ai,+f,^,+  .  .  .  -\-fnL)=0  .         .  .         .     (1) 

The  expression  {/liCi 4-/2^2+  •  •  •  -\~fn^n)  is  called  the  first 
moment  of  the  distribution  referred  to  0  as  origin.  We  conclude  that 
when  the  distribution  is  referred  to  G  as  origin,  i.e.  when  deviations 
are  measured  from  the  mean  of  the  distribution,  thefirst  moment  vanishes. 


Frequency  Distribution  Table. 

(1)  (2)  (3)  (4) 


Deviations  of  Var- 
iables from  some 
fixed  value. 

Frequency  of 
Deviations. 

Product  of  Nos. 

in  Col.  (1)  and 

Col.  (2). 

Product  of  Nos. 

in  Col.  (1)  and 

Col.  (3). 

Xq 

/i 

/3 
fn 

to 

f^2 
to 

f^\ 
f^\    . 

" 

N 

■N'l 

N'a 

In  the  notation  of  the  above  table,  where  the  dashes  are  omitted 
in  Nj,  N2  when  the  mean  is  origin,  we  have 
;c=N'i/N  and  Ni=0. 
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Again,  the  root-mean-square  deviation,  s,  measured  from  the 
arbitrary  origin  0,  is  given  by 

■  «'=(A^\+AX\+    .    .    .    +/nX„^)/(/i+/2+    .    .    .    +/n) 

=N',/N, 

and  N'2  is  called  the  second  moment  of  the  distribution  referred  to  0 
as  origin. 

Substituting  as  before  we  have 

_xHf,^    .    .    .    H-/J  +  2:^(A^1+    •    .    .    +/ngn)+(/lfl+    .    .    .    +fnL') 

(/i+   .  :  .  +/n) 

=^'+(/lf  1+    .    .    .    +fnL')l{fl+    .    .    .    +/n), 

since  /i^i+  .  .  .  +/„fn=0. 

Hence  8^=x'+g\     .         .         .     (2) 

where  a  is  the  root-mean-square  deviation  measured  from  G  as 
origin,  or  the  standard  deviation  as  it  is  called. 

From  this  result  it  is  clear  that  o-  is  always  less  than  s,  or  the  root- 
mean-square  deviation  is  least  when  measured  from  the  arithmetic 
mean. 

Generally,  if  we  write 

^'*'=(/AH  •  •  •  +/a')/(/i+  •  •  •  +/„). 

V,c=(fA^+    ■    ■    ■  /nL')/(/l+    •    •    •    +/n), 

where  E{fx^)  and  Z{f§^)  may  be  called  the  A;th  moments  referred  to  0 
and  to  the  mean  as  origins  respectively,  so  that  vi=0,  v<i=a^, 
v\=s^,  we  have 

=vu^hv^^x^    ^^~   Vfc-2  .  ^24-   .  .  .  J^^, 

For  example,  when  A; =2,  since  1/0= 1  ^^^  1^1=0, 

v^^v\-Ti'-      .         .  .  (2)  bis 

Again,  when  A;=3,  v^=^v\—^i'^—y^      .  .  •     (3) 

and,  when  ib=4,  v^=v\—^v^—^v^—y.^  .  .     (4) 

There  are  interesting  statical  analogues  to  the  above  results 
concerning  the  mean  and  standard  deviation. 
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Let  us  imagine  a  set  of  weights,  /^  /g,  /g  .  .  .  suspended  at 
Pi,  P2,  P3  .  .  .  from,  a  straight  horizontal  bar,  and  let  the  distance 
of  any  typical  weight  /  from  some  arbitrary  origin  0  on  the  bar  be  x. 
Then  the  first  moment, 

/l^l+/2^2+    •   •    •    -^h^n 
(where  some  of  the  a;*s  may  be  negative  corresponding  to  weights 
suspended  to  the  left  of  0)  measures  the  total  turning  effect  of  all 
the  given  weights  about  0,  and  if  we  further  imagine  all  these 

weights  replaced  by  a  single  weight  ^^v ^ ^ 

equal  to  their  sum  (/1+/2+  .  .  •    ^ — ^— ^ Tp — 

4-/n),  then,  in  order  to  produce  X 

the  same  turning  effect,  it  would  / 

have  to  be  placed  at  a  point  G,  the  distance  of  which  from  0 
is  given  by 

^(/l+/2+    •    •    •    +/n)=(/l^l+/2»^2+    •    •    •    +/n^„). 

Thus    x={S^x^-\-Ux^-^  .  .  .  +/„:rJ/(A+/2+  .  .  .  +/J, 

and,  statically,  this  defines  the  position  of  the  centre  of  gravity  of 
the  given  weights,  /i,  /g,  .  .  .  /„,  relative  to  0. 

As  before,  x=Sf(x-{-^)ISf 

hence  fiii+M^^  •  •  •  +fnL=0, 

and,  statically,  this  means  that  the  turning  effect  of  /j,  /a  .  •  •  /« 
about  G  is  zero,  in  other  words,  the  bar  would  balance  freely  about  G. 
Again,  the  second  moment, 

/l^  l~l~/2^  2+    •    •    •    ~\~JnXn  5 

measures  the  moment  of  inertia  of  the  weights /i,  fz  -  -  -  fn  about  0, 
and,  if  we  imagine  these  different  weights  replaced  by  a  single 
weight  (/1+/2+  .  •  .   +/«)  as  before,  the  moment  of  inertia  will 
be  unaltered  if  the  latter  be  located  at  a  distance  5  from  0,  where 
(/1+/2+  .  .  .  +fn)s'={fix\+f,x\-{-  .  .  .  +/„:r„2); 
therefore    s^=(Ax\+  .  .  .  +fnXn')l(fi+  •  •  •  +/«) 

=i:f(x+irii:f 

=x^+g\ 

as  before,  and  the  interpretation  of  this  is  that  the  square  of  the< 
radius  of  gyration  of  the  system  of  weights  about  0  equals  the 
square  of  the  radius  of  gyration  about  G,  the  centre  of  gravity  of 
the  system,  together  with  the  square  of  the  distance  of  G  from  0. 
Also,  5  is  clearly  least  when  it  is  measured  from  G. 
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6.  The  Mean  Deviation  a  Minimum  when  measured  from  the 
Median.  Consider  first  the  case  when  only  two  different  values  of 
the  variable  are  observed,  X^,  Xg,  and  let  their  deviations  from  an 
arbitrary  value,  0,  chosen  as  origin,  be  respectively  x-^,  x^. 

If  /i,  /g  be  the  observed  frequencies  of  these  values,  the  sum  of 
their  deviations  from  0  is 

which  is  clearly  less  when  the 
value  0  lies   between  X^,   Xg 
than   when    it    is    smaller    or 
greater  than  both  of  them. 
7  7         Choosing    0,    therefore,   be- 

^  X  ^      tween    X^,    Xg,   if   /i   be   the 

greater  frequency  we  write  the  deviation  sum 

=f2^+{fl-f2K, 

where  x  is  the  deviation  of  either  of  the  values  X^,  Xg  from  the 
other,  and  (/i— A)  is  positive  since  /i>/2. 

Now  this  is  evidently  least  when  (fi—f<^x-^  vanishes,  i.e.  when 
(1)  x^=^,  in  which  case  0  coincides  with  X^,  the  more  frequent  of 
the  two  variables,  or,  when  (2)  /i=/2,  and  in  this  case,  when  the 
two  observed  values  occur  equally  often,  the  deviation  sum  is 
constant  for  any  origin  between  X^  and  Xg. 

When  several  different  values  of  the  variable  are  observed,  they 
may  be  arranged  in  order  of  magnitude,  X^,  Xg,  Xg  .  .  .  X„,  from 
the  least  to  the  greatest,  with  frequencies  f^,  /g,  /s  •  •  •  fu- 
ll fi>fn  we  pair  off  f^  of  the  X„'s  with  /^  of  the  X^'s  ;  the  devia- 
tion sum  for  this  pair  is  least  and  remains  constant  when  measured 
from  any  origin  between  X^  and    ^x  X  X«-iX« 

X„.    We  next  pair  off  some  or  all    4 -^ i ^'      ^ 

of  the  Xi's  which  remain  against     '        ^  ^ 

an  equal  number  of  X„_i's  and  the  deviation  sum  for  this  pair  is 
least  and  remains  constant  when  measured  from  any  origin  between 
Xj  and  X„_i.  If  some  X^'s  still  remain,  we  pair  them  off  so  far 
as  we  can  against  an  equal  number  of  X^.g's  but,  if  it  be  X„_i's 
that  remain,  we  pair  them  off  against  an  equal  number  of  Xg's. 

This  process  can  evidently  be  continued  until  ultimately  we 
reach  the  origin  from  which  the  mean  deviation  of  the  whole 
distribution  is  a  minimum,  for  if  any  X  be  left  unpaired  the  origin 
will  coincide  with  that  X.     Otherwise,  the  deviation  is  least  when 
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measured  from  any  value  between  the  last  two  X's  paired  off  ' 
together,  and  within  that  range  it  is  constant. 

Since,  by  definition,  the  median  is  the  value  of  the  variable  half- 
way along  the  series  of  given  observations,  ranged  in  order  of  their  > 
magnitude  and  assigning  each  its  due  weight  or  frequency,  it  is  \ 
clearly  such  that  a  balance  can  be  effected  by  pairing  off  the  values  I 
on  either  side  of  it  against  one  another  in  the  manner  explained 
above  ;  it  therefore  follows  that  the  mean  deviation  of  a  frequency 
distribution  is  a  minimum  when  the  deviations  are  measured  from  ; 
the  median.                                                                                               ^x^ 

The  statical  analogy  to  the  median  also  is  worth  noting.     With  j 
the  same  notation  as  before,  the  moment  or  turning  effect  of  two 

forces,  /i,  /2,  about  0  is                 ^ .v ^  i 

But  in  this  case,  if  0  be  taken  /  f 

at  some  point  in  between  X^                 y                                        -^  ! 
and  Xg,  since  the  mean  devia-                 |    <^.^                  ^^              y 

tion  sums  the  separate  devia-                 x^        O                               I  ; 
tions  without  regard  to  sign,                                                            v 

we  must  imagine  /^  reversed                                                           -4  \ 

so  as  to  produce  a  turning  effect  in  the  same  direction  as  before,  i 

The  moment  will  then  be  still  {fiX-^+f^^^^  ^^^  i*  is  ^^^^  when  0  j 

occupies  such  a  position  than  when  it  is  on  X^Xg  produced  in  | 

either  direction.  I 

Taking  0,  therefore,  somewhere  in  between  X^  and  Xg,  the  moment  | 

may  be  written  \ 

=/2K+^2)+^i(/i-/2) ;  ; 

and,  iffi>f2,  this  is  least  when  x^  vanishes,  that  is,  when  0  coincides  I 

with  Xj,  but  if  /i=/2,  the  two  forces  constitute  a  couple,  and  the  \ 

moment  is  the  same  whatever  position  0  occupies  between  Xj  : 

and  Xg.  i 

7.  The    Method    of  Least  Squares.    To  the   student  who  is  un- 
acquainted with  the  differential  calculus,  the  following  descriptive  < 
argument,  the  basis  of  the  principle  of  least  squares,  for  determining  | 
the  values  of  m  and  c  which  make  ■ 

(ma;i+c-2/i)2+(ma;2+c-2/2y'^+   •  .  .  +(wa;„+c-2/„)2  ...  (1)  | 

a  minimum,  may  prove  instructive. 

Let  us  call  the  above  expression  E  and  let  us  suppose  that  different  j 

values  are  given  to  m  while  c  remains  unchanged  ;   in  that  case  E  j 
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will  vary  with  m,  and  we  might  imagine  the  different  values  obtained 
for  E  plotted  against  the  corresponding  values  of  m  giving  a  curve 
of  some  type.  Such  a  curve  may  rise  and  fall  in  wave-like  fashion 
as  in  the  figure,  resulting  in  maximum  points  like  A  and  C,  and 
minimum  points  like  B,  where  we  define  a  maximum  point  to  be 
such  that,  as  we  move  away  from  it  along  the  curve,  whether  to 
left  or  right,  the  size  of  the  ordinate  (and  therefore  the  value  of  E) 
decreases ;  likewise,  a  minimum  point  is  such  that,  as  we  move 
away  from  it,  the  ordinate  (and  therefore  also  E)  increases.  In 
the  neighbourhood  of  such  points  it  is  clear  that  the  size  of  the 

ordinate,  such  as  Aa  or  B6, 
changes  so  slowly  as  to  be 
practically  stationary. 

Suppose  then  that  m  and 
(m-f/x),  fj,  being  very  small, 
are  two  values  of  m  respec- 
tively at  and  near  a  minimum 
position  on  the  curve,  i.e.  a 
position  like  B  corresponding 
to  a  minimum  value  for  E. 
Since  E  near  such  a  point 
does  not  differ  appreciably  from  E  at  such  a  point,  we  may  prac- 
tically equate  the  two  expressions  obtained  for  E  by  substituting 
(m-\-fjb)  and  m  respectively  for  m  in  (1),  thus 


(m+/ta;i+c-2/ir+(m+/xa;2+c-2/2)2+  •  •  . 

=(ma;i+c-2/i)2+(ma;2+c-2/2)2+  .  .  . 

=(ma;i+c-2/i)2+(ma:2+c-?/2)2+   .  .  . 

[{mx^-\-c—yif+2fiXi{mx^-{-c—yi)-\-fjL^x\]-{-  .  .  . 
=:(ma;i+c— 2/i)2+  .  .  . 

Thus    [2xi(mXi-\-c—yj)+ixx^j]-\-  .  .  .   =0. 

Now,  the  smaller  we  take  /x,  the  nearer  to  the  truth  does  this 
result  become.  Hence,  by  making  fi  tend  to  zero,  we  are  led  to 
the  strictly  true  relation 

a;i(ma;i+c— 2/i)+  ...  =0. 

This  is  one  of  the  equations  in  the  text.  To  obtain  the  second, 
we  keep  m  constant  and  vary  c. 

Suppose  c  and  (c-\-y)  are  two  values  of  c  at  and  near  a  minimum 
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position   on   the   curve ;     then,   equating  the   two   corresponding 
values  of  E,  we  have  as  before 

(maJi  +  C-fry— 2/i)2+    .    .    .     =(^^^_^c-2/i)2+    .    .    . 

(ma:i+c-2/i+7)2+  .  .  .  ={mXi-\-c-y^)^-{-  .  .  . 
[{mXj^-^c-yJ^+2y(mXi-\-c-yj)+y^]+  .  .  .  =(mxi+c-yj)^+  .  .  . 
Thus  [2(m:^,+c-2/i)+7]+  ...   =0, 

and,  proceeding  to  the  limit  when  y  tends  to  zero,  we  reach  the 
other  equation  in  the  text,  namely, 

(ma^i+c— 2/i)+  ...   =0. 

[The  Method  of  Least  Squares  came  first  into  prominence  in 
Astronomy  in  connection  with  the  determination  of  the  best  value 
to  take  when  a  number  of  observations,  apparently  equally  reliable, 
give  results  not  quite  in  agreement.  If,  for  instance,  x  be  the  true 
value  of  some  variable,  and  if  x^,  ajg,  x^  .  .  .  x^  he  the  results  of 
n  observations,  the  method  of  least  squares  assumes  x  to  be  given 
by  making 

y^ix—xj^-i-ix—x^y^^   .  .  .  -^{x—x^f 
a  minimum. 

Now  — =2(a;— a;i)+2(a;— iCg)^-  •  •  •  +2(ic—a;J,  and  this  vanishes 
dx 

when  {x—x-i)^(x—X2)-\-  .  .  .  -\-{x—Xn)=0, 

i.e.  x=(Xj^-{-X2+  .  .  .  +Xn)/n, 

so  that  in  this  case  we  are  led  to  the  ordinary  arithmetic  mean  of 
the  n  observations  as  the  best  value. 

The  method  was  used  by  Gauss  as  early  as  1795.] 


8.  To  prove 

r+co 
J-co 

■^'6X=V7T. 

Let 

r+co 

1=        e-^dx; 

J-co 

thus,  also. 

r+co 

1=        e-'Hy; 

J-CO 

therefore, 

r+co                f+co 

P=/      e-^dx        e-^'dy 

J-co                  J-co 

r+co  r+co 

-.  I  e-(^+y-'^dxdy 

J  -co  J-co 
rco      r2n 

e-'\drd6 

Jr=oJ0  =  O 
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(by  changing  to  polar  co-ordinates) 

=      e-'^rdr\    dd 

Jo  Jo 


=[-?]:m: 


„  i: 

=(i)(27r). 

.+00 


Hence  1=1      er^dx=^/'n. 

J-ca 

9.  To  prove  : — 

(1)     r(n+l)=nr(n).       (2)  B(m,  n)=^|?^l  \ 

r(m+n) 

rco  j 

(1)  r(w+l)=      x^'erHx 

Jo  \ 

rco  I 

=—       a;«c^(e-^)  i 

Jx^O 

=^r(7i),  j 

because  the  expression  in  square  brackets  vanishes  at  both  Hmits.  ^ 

/•CO                                 rco  ' 

(2)  r(m)r(7i)=      e-^^'^-Hil    e-^Tj^'-^d'n  \ 

Jo  Jo 

=  (  e-^x^^-'^2xdx\    e-y''y'^''-'^2ydy, 

Jo                            Jo  I 

where  x^=i,  y^=7].  \ 

Hence           r(m)r{n)=4:(    f  e-^^+y'^x^'^-^y'^''-Hxdy  \ 

Jo  Jo  • 

=  [     f  "  e-Vm+2n-2  cos^"*"!^  sin^^-i^  rdrdO    \ 

Jr=oJe=o 

(by  changing  to  polar  co-ordinates) .  • 

Thus              T{m)T(n)  =  T e-''\^^+^^-^dr  j  ^cos^'^'-W  sin^'^-Wdd  i 

where                         p=T^  a,nd  k=sin^B;  \ 

therefore,      r{m)r(n)=r{m-\-n)'B{n,m)  \ 

=r(m+7i)B(m,  7i)  1 

by  symmetry.  i 
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10.  Elementary  Method  of  Testing  the  Probability  Integral  Table. 

The  reader  may  find  more  satisfaction  in  using  the  probability 
integral  table  if  he  tests  for  himself  one  or  two  of  its  results  by 
means  of  squared  paper  or  in  some  other  way. 

We  have  seen  that  the  probability  of  an  error  between  0  and  g^ 
is  given  by  the  expression 


-^^di. 


V2ttj 
Put  ^=V2Xf  and  this  becomes 

If"  f  I  f' 

-^    e-^dx=  I  e-^dx/  I      e-'^dx,  by  Note  (8) 

i/^Jo  Jo  /  J-oo 

=area  OBPN/area  A'BA,  in  the  figure. 


■+00 


V. 


Now  the  graph  of  y=e~^  is  drawn  in  fig.  (40)  of  the  text,  and  it 
is  possible  therefore  to  get  an 
approximation  to  the  above 
result  for  any  value  of  x  by 
counting  the  number  of  small 
squares  in  that  figure  enclosed 
by  the  areas  corresponding  to 
OBPN  and  A'BA  respectively. 
Each  complete  small  square 
may  be  reckoned  as  1,  and  each  ^   .^,2 

portion   of   a   square   may   be 

reckoned  as  1  if  it  exceeds  half  a  square  and  as  zero  if  it  is  less 
than  half  a  square. 

This  gives,  for  example, 


1      /•0-25 

VttJO 


«^^a;=98/707  =0-139, 


whereas  the  tables  give  0-138. 

For  a  value  like  a; =0-71,  count  the  squares  in  the  usual  way 
between  curve,  axes,  and  ordinate  a;  =0-70 ;  then  add  to  the  result 
one-fifth  of  the  number  of  squares  in  the  small  slice  of  area  between 
curve,  axis,  and  ordinates  a; =0-70  and  .r=0-75.     We  get 


1    roTi 


e-^dx=2U)ll01  =0-339 


as  compared  with  0-342  from  the  tables. 

These  results  are  not  unsatisfactory  considering  the  rough  nature 
of  the  method  followed  to  obtain  them. 
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11.  Bravais'  Law  of  Frequency  in  the  case  of  two  Correlated 
Variables  with  certain  Deductions  therefrom— [based  on  Professor 
Karl  Pearson's  memoir,  Regression,  Heredity  and  Panmixia  {Phil. 
Trans.,  vol.  187a,  pp.  253-318)]. 

Consider  two  variables  whose  deviations,  x  and  y,  from  their 
respective  means  are  due  to  a  number  of  independent  causes,  the 
deviations  in  which  from  their  means  can  be  quantitatively  denoted 
by  61,  €2,  .  .  .  6^. 

We  assume  that  each  e  deviation  is  so  small  compared  to  the 
mean  value  from  which  it  is  measured  that  x  and  y  can  be  sensibly 
expressed  as  linear  functions,  thus 

x=a^e^+a^e^-\-  .  .  .  -\-a^,,e^       .  .  .     (I) 

2/ =6^6,+ 6262+  ...  +6^6^        .  .  .     (2) 

(Some  of  the  a's  and  6's  may  be  zero,  and  if  x  only  involved,  say, 
^1)  ^2  •  .  •  €fc,  and  y  only  involved  e^+i  .  .  .  e^,  then  it  would  be 
natural  to  expect  no  correlation  between  x  and  y.) 

We  further  assume  that  each  e  varies  according  to  the  normal 
law  with  S.D.  a  with  appropriate  suffix. 

Equations  (1)  and  (2)  show  that  the  same  x  and  y  may  arise  in  a 
multitude  of  different  ways  obtained  by  varying  the  e's  so  that 
their  weighted  sums  (the  a's  and  6's  being  the  weights)  remain 
unaltered.  The  probability  that  the  particular  deviations  Ijdng 
between 

^1^(^1+^61),  62.^(62+862),  .  .  .  e^S^^-\-he^) 

shall  concur,  since  they  are  all  independent,  is 


z= 


^^I__g-e3i/2<r2i  )     ,    .         /  _^f«^g-em2/2crm2 
,C7iV27r  /     '    '    '    WV27T 


But,  writing 

a3e3+    •    •    •    +<^mem=a»  ^3^3+    •    •    •    +^m€m=ft 

equations  (1)  and  (2)  become 

a^ei-^a^e^-{-(a—x)=0 
6161+6262+ (iS-i/)=0. 


Therefore  — . 


61  _  62  _        1 


0'2(?-y)-h(<^-^)     6i(a-a;)-ai(^-2/)     a^h^-aj)^ 
And,  for  any  function  z/, 

J  J  J  J      \0€i   O62       O62   061/ 

= (aib2—a2bi)jjvdeid€2' 
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Hence 

_     BxSy  e  U^a^  •  •  • +2<r„2; 


g~      2cr2i(oi&2-a2bi)2    "     2ayiaib2  -  cu^hyi         g^      .     .     .     8e    . 

The    total    probability    for    deviations    between   x^{x-\-Sx)   and 
y^{y+Sy)  is  obtained  by  integrating  z  between  limits  — 00  and  -f-oo 
for  all  the  e's  from  63  to  €„j,  and  it  is  not  very  difficult  to  see  that 
this  will  ultimately  lead  to  an  expression  of  the  form 
C .  8x8y .  e-("^^+^^^y+^y''). 

This  is  Bravais'  Law  of  Frequency. 

To  find  the  meanings  of  the  constants  a,  b,h.     The  total  probability 
for  a  deviation  between  x^{x-\-8x)  associated  with  any  deviation  y  is 


=:C8xj 


00 


But  if  a;  be  subject  to  the  normal  law,  the  probability  for  a  devia- 
tion between  x^(x-\-8x)  is 

V27r  .  (7a,  ' 

where  o-a.  is  the  S.D.  of  x  independent  of  y. 
Comparing  these  two  results,  we  have 

if  r=—h/Vab. 

Similarly,  l/2<7/=(ab-h^)/a=b(l-0, 

so  that  h=— rVab=— r/2cT,(Ty(l— r*). 

Again,  we  may  integrate  z  for  all  values  of  x  and  y,  and  so  get 
the  total  frequency,  N,  of  the  (a;,  y)  pair. 

/+00  r+co 
^Ao^+2kxy^rmdxdy 
-00  J-co 

=Gy/nTb\^"e-^-'^-^'^"dx 

J-co 

^CV7T]bVWlbl{ab-h% 


278 

STATISTICS 

Hence 

7T 

=-V[«6(l-r2)] 

_                   N 

2770r,(7,V(l-^') 

Thus 

1      rofi     2ra^     3/2-1 

where  C  has  the  above  value. 

It  still  remains  to  interpret  r  and  to  see  that  it  is  really  the 
coefficient  of  correlation  as  defined  in  Chapter  x.  For  this  purpose 
let  us  suppose  we  have  observed  n  pairs  of  associated  x's  and  2/'s, 
namely 

(^l2/l).  (^2^2)    •    •    •    {^nVn)' 

The  probability  for  such  a  concurrence,  taken  along  with  a  given 
value  for  r  and  assuming  the  observations  independent,  is  pro- 
portional to 

1                           1       p%    2ra;iyi     yi^-i                             •.  1      T^    2r!CwVn     yn^l 
e~2(l-r2)La:r2     <r;c<rs, +crj,2j  X          y g    2(l-r2)Lo-^      o-;ro-y  +o-y2J 

V(i-r^)  V(i-»-') 

1  1       r2a:2_2rSa:j/    2l/2"l 

:= g  ~  2(1  -  r2)L<rx^      tr;t<ry  "^  <ry2  J 

(l_^2)«/2 

_(l_,2)-n/2g-.7rbjf^--2--^» 

where /c=.Exy/nagjcjy 


=e 


_|log(l-r2)--^^(l -Kr) 


Now  the  probability  of  this  particular  distribution  is  greatest 
when 

J  log  (l-r^)+p'^ 

is  least,  and,  differentiating  with  respect  to  r,  this  leads  to 
2r     {l-r^){-K)+2r{l-Kr)_^ 

^1-7-2  (1-7-2)2 

i.e.  -r(l-r^)-Kil-r^)+2r(l-fcr)=0, 

i.e.  —r-\-r^—K-{-Kr^-{-2r—2Kr^=0, 

i.e.  (r-/c)(l+r2)_0. 

It  is  not  difficult  to  show  that  r=K  gives  a  minimum  ;  hence  the 
required  probability  is  a  maximum  and  we  get  the  best  value  for 
the  coefficient  by  taking 
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CERTAIN  CURRENT  SOURCES  OF  SOCIAL  STATISTICS 

Any  one  who  is  anxious  to  get  reliable  figures  bearing  upon  some 
social  matter  is  somewhat  at  a  pause  unless  he  is  thoroughly  con- 
versant with  all  the  statistical  ramifications  of  Government  autho- 
rities, local  and  national,  of  trade  unions,  friendly  societies,  and 
hosts  of  other  bodies  of  a  public  or  semi-public  character. 

While  recognizing  the  lavish  outpouring  of  statistics  of  all  kinds 
upon  a  multitude  of  diverse  topics  every  year,  and  appreciating  the 
immense  care  and  patience  shown  by  those  who  are  responsible  for 
their  collection  and  preparation,  one  cannot  but  deplore  the  lack 
of  any  co-ordinating  principle  in  general  between  one  body  and 
another  either  in  deciding  what  statistics  shall  be  collected,  by 
whom  and  when  they  shaU  be  collected,  or  how  afterwards  they 
shall  be  tabulated  and  presented  to  the  public.  Too  often  a  narrow- 
minded  jealousy  prevents  one  authority  from  consulting  with 
another,  and  such  co-operation  as  does  exist  is  due  largely  to  the 
efforts  of  able  and  enlightened  individuals.  The  result  is  that  a 
vast  amount  of  labour  and  expense  goes  waste  and  the  loss  to  the 
public  is  incalculable,  but  the  public  do  not  care,  and  they  do  not 
care  because  they  do  not  know. 

At  present,  to  quote  from  an  influential  petition  on  the  subject 
recently  presented  to  His  Majesty's  Government,  *  It  is  almost 
universally  the  case  that  any  serious  investigation  is  reduced  to 
roughly  approximate  estimates  in  relation  to  some  factor  which  is 
essential  for  its  result.  ...  It  is  not  too  much  to  say  that  there  is 
hardly  any  reform,  financial,  social,  or  commercial,  for  which  adequate 
information  can  be  provided  with  our  present  machinery.'  But 
this  state  of  things  would  be  partly  remedied  by  adequate  control 
such  as  might  be  secured  by  the  establishment  of  a  central  statis- 
tical office  with  a  minister  in  charge  who  should  be  responsible  for 
unification  so  far  as  possible  in  the  collection,  tabulation,  and  issue 
of  all  public  statistics. 

It  is  scarcely  possible  for  a  single  private  individual  to  make 
a  quantitative  investigation  of  any  social  question  on  a  large  enough 
scale  to  produce  results  of  real  value  ;  conspicuous  instances  like 
Booth  and  Rowntree  might  seem  to  be  exceptions  to  this  rule,  but 
even  they  had  a  number  of  workers  acting  under  their  direction, 
without  whose  aid  their  task  would  have  seemed  almost  hopeless. 
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For  such  statistics  as  we  have  we  are  therefore  dependent  upon 
Government  departments,  local  authorities,  public  officials,  trade 
associations  representing  employers  or  labour,  public  companies, 
and  so  on.  The  reader  who  wishes  to  get  some  idea  of  the  extent 
and  the  limitations  of  official  British  statistics  is  referred  to  the 
admirable  introductory  chapters  of  Bowley's  Elements  of  Statistics. 
Here  we  cannot  do  more  than  mention  a  very  few  of  the  most 
important  sources  whence  such  statistics  are  derived. 

The  most  voluminous  of  all  our  records  is  probably  the  Census 
of  the  Population  which  is  taken  every  ten  years.  Its  scope  is  but 
faintly  realized  by  enumerating  the  chief  subjects  on  which  the 
Registrar- General  asked  information  from  each  householder  in  1911, 
namely : 

(1)  Numbers  and  Geographical  Distribution  of  the  Population. 

(2)  Nationality  and  Birth-place. 

(3)  Numbers  at  Different  Ages,  Male  and  Female. 

(4)  Numbers  Single,  Married,  and  Widowed. 

(5)  Sizes  of  FamiHes,  including  Children  Dead. 

(6)  Numbers  engaged  in  different  Professions  and  Occupations. 

(7)  Numbers  Blind,  Deaf,  Dumb,  not  in  their  Right  Mind. 

(8)  Numbers  occupying  Dwellings  of  Different  Sizes  as  measured 
by  the  Number  of  Rooms. 

This  may  seem  an  ambitious  scheme  when  it  is  stated  that  the 
mere  enumeration  of  the  people  was  successfully  opposed  less  than 
two  hundred  years  ago  as  '  subversive  of  the  last  remains  of  EngUsli 
liberty  and  likely  to  result  in  some  public  misfortune  or  an  epidemi- 
cal disorder,'  and  the  first  census  was  only  taken  in  1801.  [See 
Article  in  the  Encyclopaedia  Britannica  on  the  subject.] 

The  results  of  each  census  are  published  in  bulky  volumes  as 
soon  as  they  can  be  reduced  and  tabulated,  a  process  which,  of 
course,  takes  a  considerable  time  even  for  an  army  of  workers 
with  calculating  machines  and  every  modern  device  to  faciUtate 
their  progress.  It  is  to  be  regretted  that  more  is  not  done  to 
advertise  so  valuable  a  record  of  work  by  publication  in  a  cheap 
and  attractive  form  of  a  summary  of  matters  which  vitally  affect 
the  good  of  the  commonwealth.  As  it  is,  the  census  volumes  tend 
to  be  purchased  only  by  pubUc  authorities  and  officials  who  require 
to  use  them  occasionally  as  books  of  reference. 

Neglect  of  the  blandishments  of  advertisement — to  be  commended 
in  general  because  such  neglect  is  somehow  associated  with  the 
presentation  of  all  truth — may  be  perhaps  carried  too  far  in  the 
issue  of  statistics. 
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It  will  be  noted  that  in  the  periodical  census  no  mention  is  made 
of  wages  though  the  people  are  classified  as  regards  occupation, 
and  for  information  upon  this  point  we  must  turn  to  another  source. 
The  last  general  census  of  wages  was  taken  in  1906,  following 
and  improving  upon  an  earlier  inquiry  twenty  years  before,  but, 
in  connection  with  an  inquiry  by  the  Board  of  Trade  into  the  cost 
of  living  of  the  working  classes,  information  was  collected  as  to 
rates  of  wages  in  1912  of  workpeople  in  certain  occupations  in  the 
building,  engineering,  and  printing  trades,  these  being  selected  as 
industries  common  to  most  towns,  and  because  the  time  rates  of 
wages  paid  in  them  are  largely  standardized. 

The  1906  inquiry  into  earnings  and  hours  of  labour,  unlike  the 
decennial  census,  was  conducted  on  a  voluntary  basis  and  was 
never  wholly  completed.  In  brief  it  set  out  to  discover  from 
employers  : — 

(1)  The  Numbers  of  Working-people  Employed  in  Various 
Occupations,  distinguishing  Men,  Women,  Lads,  and  Girls. 

(2)  The  Nature  of  the  Work  done  and  the  Rates  of  Wages  Paid, 
distinguishing  Time  Rates  from  Piece  Rates. 

(3)  The  Hours  Worked,  distinguishing  Under-  or  Over-time  from 
Normal  Time. 

The  ground  actually  covered  by  the  inquiry  embraces  the  fol- 
lowing trades :  Textiles,  Clothing,  Building  and  Woodworking,  Public 
Utility  Services,  Metal,  Engineering,  and  Shipbuilding — in  1906  ; 
also  Agriculture,  and  Railway  Service — ^in  1907  ;  the  reports  upon 
these  trades  were  published  separately  at  different  dates  between 
1909  and  1912,  and  the  following  trades  were  bulked  together  in 
one  volume,  pubHshed  in  1913 — Paper  and  Printing  ;  Pottery, 
Brick,  Glass,  and  Chemicals  ;  Food,  Drink,  and  Tobacco  ;  and 
Miscellaneous  Trades. 

The  Cost  of  Living  Inquiry  of  1912  was  in  continuation  of  a 
similar  inquiry  in  1905,  which  in  addition  compared  conditions  in 
the  United  Kingdom  and  certain  foreign  countries.  It  dealt  not 
only  with  wages  but  also  with  rents  and  retail  prices. 

The  report  states  that  '  particulars  as  to  the  rent  and  accommo- 
dation of  tjrpical  working-class  dwellings  were  obtained  from 
officials  of  local  authorities,  surveyors  of  taxes,  house  owners  and 
agents,  and  by  house-to-house  inquiry.'  Also  *  returns  of  the 
prices  most  generally  paid  by  working-class  customers  for  a  number 
of  specified  commodities  were  obtained  in  each  town  by  personal 
inquiry  from  a  number  of  retailers  engaged  in  working-class  trade.' 

Since  then  Lord  Sumner's  Committee  and  a  Committee  of  the 


282  STATISTICS 

Agricultural  Wages  Board  have  examined  the  change  in  the  cost  of 
living  between  1914  and  1919,  as  evidenced  by  a  number  of  house- 
hold budgets  collected  from  among  urban  working- classes  and 
workers  in  rural  districts  respectively. 

One  other  highly  important  inquiry  carried  out  by  the  Board  of 
Trade  deserves  notice,  namely,  the  First  Census  of  Production  of  the 
United  Kingdom  (1907). 

The  published  report  shows  : — 

(1)  The  total  Net  Output  in  Money  Value  for  each  Trade  Group 
in  each  Industry. 

(2)  The  Number  of  Persons  Employed  in  each  Trade  Group 
(salaried  persons  and  wage-earners  exclusive  of  outworkers). 

(3)  The  Net  Output  per  Person  Employed  in  each  Trade  Group 
as  deduced  from  (1)  and  (2). 

(4)  The  Horse-power  of  Engines  in  Mines,  Quarries,  or  Factories 
Employed  in  each  Trade  Group. 

It  is  explained  that  the  term  '  net  output '  here  represents  the 
value  of  the  aggregate  output  of  the  factories,  etc.,  from  which 
returns  were  received  in  each  trade  group,  after  deducting  the  cost 
of  materials  purchased  from  factories,  etc.,  not  included  in  the 
group,  or  supplied  by  merchants  or  others  not  making  returns  to 
the  Census  of  Production  Office. 

Valuable  as  the  results  of  these  inquiries  undoubtedly  are,  they 
would  be  of  still  more  value  were  it  only  possible  satisfactorily  to 
collate  the  various  returns  of  population,  wages,  and  production. 
No  record  of  wages  was  included,  for  example,  in  the  Census  of 
Production  statistics,  and  it  is  quite  impossible  to  deduce  the  number 
of  wage-earners  and  those  dependent  upon  them  in  any  trade  at 
any  given  time. 

Apart,  however,  from  such  special  inquiries  as  we  have  instanced, 
and  the  ten-yearly  census  of  the  people,  there  are  other  periodical 
records  issued  which  provide  us  with  valuable  information.  The 
Ministry  of  Labour,  until  recently  a  special  branch  of  the  Board 
of  Trade,  charged  with  the  duty  of  keeping  in  touch  with  labour 
conditions,  issues  each  month  a  Labour  Gazette  giving  particulars 
relating  to  the  state  of  employment  in  the  principal  trades  in  the 
United  Kingdom  based  on  returns  from  employers,  trade  unions, 
and  employment  exchanges,  besides  information  concerning  trade 
disputes,  changes  in  wages  and  hours,  the  course  of  prices,  railway 
traffic  receipts,  foreign  trade,  etc.  The  Board  of  Trade  also  pub- 
lishes weekly  a  Journal  and  Commercial  Gazette  dealing  with  matters 
of  interest  to  all  who  are  engaged  in  commerce  or  finance  ;   while  a 
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Monthly  Bulletin  of  Statistics  of  production,  trade,  finance,  employ- 
ment, etc.,  at  present  issued  under  the  name  of  the  Supreme 
Economic  Council,  is  an  important  recent  addition  to  our  knowledge 
of  international  statistics. 

Again  the  Registrar- General  makes  a  quarterly  return  and  annual 
summary  of  births,  marriages,  and  deaths  in  the  different  counties 
of  England  and  Wales,  and  of  births,  deaths,  and  infectious  diseases 
in  certain  large  towns.  In  each  public  health  area  the  medical  officer 
reports  periodically  upon  the  hygienic  condition  of  the  district  and 
the  health  of  the  people  under  his  care.  The  Board  of  Education 
is  answerable  for  conditions  in  the  schools,  and  the  Home  Office 
in  factories  and  prisons  ;  they  report  from  time  to  time.  The 
Ministry  of  Health  similarly  issues  returns  relating  to  pauperism 
and  to  housing,  while  the  Board  of  Agriculture  and  Fisheries  registers 
the  acreage  under  crops  and  the  number  of  Uve  stock  in  the  United 
Kingdom,  and  the  Commissioners  of  Customs  record  the  expansion 
or  contraction  of  foreign  trade. 

In  addition  we  have  the  endless  accounts  and  statistics  suppUed, 
some  voluntarily  and  some  compulsorily,  by  municipal  bodies, 
public  companies,  banks,  trade  associations,  co-operative  societies, 
insurance  companies,  trade  unions,  etc. 

And  yet,  in  spite  of  all  this  wealth  of  statistics,  some  surprising 
gaps  occur,  as  we  have  already  seen,  in  important  particulars 
which  cannot  be  traced.  We  shall  quote  only  one  more  instance 
of  such  a  hiatus — the  income-tax  returns  provide  a  basis  for  measur- 
ing that  part  of  the  national  income  which  is  subject  to  taxation, 
some  idea  also  can  be  formed  of  what  the  wage-earners  receive, 
but  as  to  the  earnings  of  the  portion  of  the  community  falling  in 
between  these  two  classes  we  are  entirely  ignorant.  It  is  possible 
that  war  conditions  during  the  years  1914-19  may  have  vastly 
increased  the  knowledge  of  the  Government  as  to  some  matters 
such  as  internal  resources  and  inland  trade,  of  which  little  was 
known  before,  but,  if  so,  the  public,  whom  it  concerns  so  closely, 
have  not  yet  been  permitted  fully  to  share  in  this  advantage. 

For  an  excellent  summary  of  labour  statistics  compiled  or  col- 
lected by  the  Government  the  reader  is  recommended  to  consult 
the  Annual  Abstract  of  Labour  Statistics  of  the  United  Kingdom, 
published  in  the  past  by  the  Labour  Department  of  the  Board  of 
Trade. 


*\ix 


284 


STATISTICS 


A  NOTE  ON  TABLES  TO  AID  CALCULATION  ' 

The  short  tables  which  follow  are  only  inserted  as  specimens,  as 
it  is  expected  that  the  reader  who  wishes  to  make  extensive  use 
of  such  tables  will  have  access  to  the  fuller  ones  to  which  reference 
is  made  below. 


-1-00 


Fio.  (55), 


Probability  Integral   Table,   giving  area  of  curve  z- 
terms  of  corresponding  abscissa,  see  fig.  (55) : — 
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Fig.  (56),  the  result  of  plotting  a  against  |,  enables  us  to  estimate' 
the  probability  of  an  error  Ijdng  between  any  two  limits. 
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Table  giving  P,  to  test  * 

values  of  n'  and  -^  : — 


goodness  of  fit,'  corresponding  to  certain 


n' 

7 

x2->4 

5 

6 

7 
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11 

12 
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•67668 

•54381 
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•32085 
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•17358 
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•04304 
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8 
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•13862 

•10056 '  ^07211 
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9 
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•43347 
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•15120  -11185 

•08176 ,  05914 

10 
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•53415 
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-21331  16261 

•12232  09094 
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•35752  i  •2850()  -22367 

•17299 :  13206 

12 
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•4432(3  -36264  •29333 
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•98344 

•95798 

•91608 

•85761 

•78513 

•70293 

•61596 

•52892  -44568  -36904 
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14 
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•77294  ^69393 

•61082  -52764  -44781 

•37384 

•30735 

15 

•99547 

•98581 

•96649 

•93471  ^88933 

•83105  ^76218 

•08604  j -60630  i  -52652 

-44971 

•37815 

One  of  the  earliest  tables  of  the  probability  integral  appeared  in 
Kramp's  Analyse  des  Refractions  (Strasbourg,  1798),  where  the 
calculation  of  j^e-'^Hx  was  given  to  eight  places  from  x=0  to  a; =3 
at  intervals  of  0-01.  Tables  more  recent  and  extensive  are  those 
due  to  J.  Burgess  {Trans.  Roy.  Soc.  Edin.  1900)  and  to  W.  F. 
Sheppard  (Biometrika,  vol.  ii.,  pp.  174-190).     Of  these  the  latter 
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is  reproduced  in  the  admirable  Tables  for  Statisticians  and  Bio- 
metricians,  edited  by  Karl  Pearson  (Camb.  Univ.  Press,  1914),  and 
the  same  volume  also  contains  Palin  Elderton's  P  Tables  for  testing 
'  goodness  of  fit '  which  first  appeared  in  Biometrika,  vol.  i.,  and 
Duffell's  Tables  of  the  Logarithms  of  the  T  Function  from  Biometrika, 
vol.  vii.,  besides  a  large  number  of  other  valuable  tables. 

It  should  be  remarked  in  connection  with  the  last-named  table  that 
the  formula  T(x-\-\)=x  T{x)  enables  us  to  reduce  the  calculation 
of  any  T  function  to  one  in  which  x  lies  between  1  and  2,  by  repeated 
applications  of  the  logarithmic  relation,  thus 
logr(a;+l)=log  a:+log  T(x) 

=log  a;+log  (a:-l)+log  T(x-\), 


286 


STATISTICS 


and  so  on.     When  x  is  large,  however,  say  greater  than  10,  the 
well-known  approximate  formula 

(see,  for  instance,  Whittaker's  Analysis,  §  110)  will  be  found  useful, 
and  it  may  also  be  written 

log  ^:(^±i)=0.3990899+«:2??l!^+^  log  x, 

x^e-^  X 

a  form  often  convenient. 

It  may  be  of  service  to  record  here  the  values  of  a  few  constants 
which  frequently  recur  for  speedy  reference  : 


6=2-718  2818 

7r=3-141  5926 

logio  2=0-301  0300 

i  =  0-367  8794 

e 

logio7r= 0-497  1499 

logio  3=0-477  1213 

logio  6=0-434  2945 
logio(logioe)  =  1-637  7843 

logio^^  1-600  9101 

V27r 

The  statistician  who  has  Pearson's  Tables,  Barlow's  Tables  of 
Squares,  ate,  together  with  a  good  set  of  Tables  of  Logarithms 
(unless  he  is  so  fortunate  as  to  have  a  mechanical  calculator,  for 
instance  a  Brunsviga,  at  his  disposal)  and  of  Trigonometrical 
Functions  such  as  Chambers's  Seven-Figure  Tables,  may  consider 
himself  amply  provided  for  serious  research  and  decidedly  better 
off  than  his  predecessors  who  prepared  the  way  for  him  by  doing 
great  work  with  much  poorer  tools. 


Edinburgh  :  Printed  by  T»  and  A.  Constabi.e  LtO; 
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